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(57) Abstract 

Double-stranded cDNA was synthesized from nucleic 
acid extracted from Norwalk virus purified from stool speci- 
mens of volunteers. Single-stranded RNA probes derived from 
the DNA clone after subcloning into an in vitro transcription 
vector were also used to show that the Norwalk virus contains 
an ssRNA genome of about 8 kb in size. The availability of a 
Norwalk-specific cDNA and the genome sequence information 
allow rapid cloning of the entire genome and establishment of 
sensitive diagnostic assays. Such assays can be based on detec- 
tion of Norwalk and Norwalk-related virus nucleic acids or 
Norwalk and Norwalk-related viral antigens using probes or 
primers and polyclonal or monoclonal antibodies to proteins 
expressed from the cDNA or to synthetic peptides made based 
on the knowledge of the genome sequence. Assays using pro- 
teins deduced from the Norwalk virus genome and produced in 
expression systems can measure antibody responses. Vaccines 
for Norwalk and related viruses are made from an expressed 
Norwalk virus protein. 
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Methods and Reagents To Detect and Characterize 
Norwalk and Related Viruses 

This application is a Continuation-in-Part of Applicant's Co- 
Pending U.S. Application Serial No. 07/443,492 filed November 8, 1989, 
5 U.S. Application Serial No. 07/515,993, now abandoned, filed April 27, 
1990, U.S. Application Serial No. 07/573,509 filed August 27, 1990, and U. 
S. Application Serial No. 07/696,454 filed May 6, 1991, all entitled 
"Methods and Reagents To Detect and Characterize Norwalk and Related 
Viruses." 

10 This invention is supported in part through grants or awards from 

the Food and Drug Administration and the National Institute of Health. 
The United States Government may have certain rights to this invention. 

Field of the Invention 
The present invention relates generally to synthesizing clones of 
15 Norwalk virus and calicivirus and to making probes to Norwalk and 
related viruses. It also relates to methods of detection and 
characterization of Norwalk and related viruses. 

Background of the Invention 
Norwalk virus is one of the most important viral pathogens causing 

20 acute gastroenteritis, the second most common illness in the United States 
(Dingle et al., Am. J. Hyg. 58:16-30 (1953); Kapikian and Chanock, 
"Norwalk group of viruses" in B.N. Fields' 2d ed. of Viroloffv. Raven Press, 
New York, pp. 671-693 (1990)). Up to 42% of cases of adult viral 
gastroenteritis have been estimated to be caused by Norwalk or 

25 Norwalk-like viruses (Kaplan et al., Ann. Internal Med. 96(6):756-761 
(1982)). Both water and foodborne transmission of Norwalk virus has 
been documented, and particularly large epidemic outbreaks of illness 
have occurred following consumption of contaminated shellfish, including 
clams, cockles, and oysters (Murphy et al., Med. J. Aust 2:329-333 (1979); 
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Gunn et aL, Am. J. Epidemiol 115:348-351 (1982); Wilson et al., Am. J. 
Public Health 72:72-74 (1982); Gill et al., Br. Med. J. 287:1532-1534 
(1983); DuPont New Engl J. Med. 314:707-708 (1986); Morse et al., New 
Engl. J. Med. 314:678-681 (1986); Sekine et al., Microbiol Immunol 
5 33:207-217 (1989)). An increase in fish and shellfish-related food 
poisonings has recently been noted and attributed to increased recognition 
of these entities by clinicians as well as to increased consumption of 
seafood (Eastaugh and Shepherd, Arch. Intern. Med. 149:1735-1740 
(1989)). 

10 Norwalk virus was discovered in 1973. Until recently, knowledge 

about the virus has remained limited because it has failed to grow in cell 
cultures and no suitable animal models have been found for virus 
cultivation. Human stool samples obtained from outbreaks and from 
human volunteer studies, therefore, are the only source of the virus. Still, 
15 the concentration of the virus in stool is usually so low that virus 
detection with routine electron microscopy is not possible (Dolin et al., 
Proc. Soc. Exp. Med. and Biol 140:578-583 (1972); Kapikian et al., J. 
Virol. 10:1075-1081 (1972); Thornhill et al., J. Infect. Dis. 132:28-34 
(1975)). Current methods of Norwalk virus detection include immune 
20 electron microscopy and other immunologic methods such as radio 
immunoassays (RIAs) or a biotin-avidin enzyme linked immunoabsorbent 
assays (ELISAs) which utilize acute and convalescent phase serum from 
humans. To date, no hyperimmune antiserum from animals has been 
successfully prepared due either to insufficient quantities or unusual 
25 properties of the viral antigen. Preliminary biophysical characterization 
of virions has indicated particles contain one polypeptide (Greenberg et 
al., J. Virol 37: 994-999 (1981)), but efforts to characterize the viral 
genome have failed. 

Viruses related to Norwalk virus include small round enteric 
30 viruses, such as viruses with typical calicivirus morphology and the 
astroviruses. The classification scheme for the human small enteric 
viruses shown in Table 1 here is an updated version of a scheme outlined 
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by Caul and Appleton in the Journal of Medical Virology, 9:257-265 
(1982). This system is referred to in Cubitt et al., J. Infectious Diseases, 
156:806-814 (1987); Table 1 of the article by Appleton entitled "Small 
round viruses: classification and role in food-borne infections", in the book 

5 NovpI Diarrhoea Viruses. Ciba Foundation Symposium No. 128, pp. 108- 
125 (John Wiley & Sons, N.Y. (1987)); and Table 1 of the chapter entitled 
"Norwalk group of viruses" by Kapikian and Chanock from the book 
Virology (B.N.Fields, 2d ed., Raven Press (1990)). 

As shown in Table 1, human small round structured enteric viruses 

10 include caUrivirus and astrovirus. The recent sequencing of Norwalk 
virus indicates that Norwalk virus is a caUcivirus and has a genome 
organization like that of other cahciviruses. In addition to the human 
small round enteric viruses are a large number of non-human small round 
viruses which have been classified as astroviruses, cahciviruses, and small 

15 round structured viruses based upon their morphology. Examples of these 
viruses are the primate caUcivirus isolated from the pygmy chimpanzee, 
described in the journal Science 221:79-81 (1983), a porcine enteric 
caUcivirus, described in the Journal of Clinical Microbiology 12:105-111 
(1980), and bovine astroviruses described in Vet Pathol. 21:208-215 (1984). 

20 Individual caUcivirus types will at times exhibit host specificity and tissue 
tropisms, but as an overall group they cause gastroenteritis, hepatitis, 
abortion, skin lesions, pneumonia, myocarditis, and encephalitis. The 
cahciviruses infecting humans fit in this context in that Norwalk-Uke 
viruses cause gastroenteritis, hepatitis E causes hepatitis, and San Miguel 

25 sea Uon virus type 5 causes skin vesicles in humans as weU as infections 
in seals, fish, pigs and cattle. (D. O. Matson "CaUcivirus Infections" in 
Textbook of Pediatric Infectious Disease, 3d ed., R. D. Feigin and J. D. 
Cherry, eds., W. B. Sanders, PhUadelphia, (in press)). 
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Summary of the Invention 
It is therefore an object of the invention to detect and characterize 
the Norwalk and related virus genomes by synthesizing and cloning a 
ci)NA library. 

5 It is an associated object of the invention to deduce amino acid 

sequences from Norwalk and related viral cDNA. 

Another object of the invention is to develop probes or primers to 
confirm the genetic relationship between the Norwalk virus and the 
Norwalk-related viruses. 
10 Still another object of the invention is to develop a method of 

preparing polyclonal and monoclonal antibodies to the Norwalk and 
related viruses. 

Yet still another object of the invention is to develop a method of 
making probes to detect Norwalk and related viruses. 

15 A further object of the invention is to use the cDNA or fragments 

or derivatives thereof in assays to detect Norwalk and related viruses in 
samples suspected of containing the viruses. 

A still further object of the invention is to express proteins to 
measure antibody responses. 

20 A nucleotide sequence of the genome sense strand of the Norwalk 

virus cDNA clone intended to accomplish the foregoing objects includes 
the nucleotide sequence shown in Table 2. Within the Norwalk nucleotide 
sequence are regions which encode proteins. The nucleotide sequence of 
the Norwalk virus genome, its fragments and derivatives are used to make 

25 diagnostic products, vaccines and antivirals. 

Other and still further objects, features and advantages of the 
present invention will be apparent from the following description of a 
presently preferred embodiment of the invention. 

Brief Description of the Figures 
30 Figure 1. EM picture of Norwalk and related viruses. Norwalk virus (A), 
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human Calicivims (B), small round structured virus (C), and human 
astrovirus (D). The var is 01 |im. 

Figure 2a. Hybridization of stool samples with ^-labeled plasmid DNA 
for screening positive Norwalk cDNA clones. Nucleic acids from paired 

5 stools [before (b) and after (a) infection with Norwalk virus] from two 
volunteers (1 and 2) were dotted on Zetabind filters. Replicate strips were 
prepared and hybridized at 50°C and 65°C with each test clone (pUC-27, 
pUC-593, pUC-13 and pUCNV-953). One clone (pUCNV-953) which 
reacted only with stool samples after (but not before) Norwalk infection 

10 was considered as a potential positive clone and was chosen for further 
characterization. 

Figure 2b. Dot blot hybridization of clone ^-labeled pUCNV-953 with 
another 3 sets of stool samples collected at different times after infection 
(B = before acute phase of illness; A = acute phase of illness; P = 
15 post-acute phase of illness) of 3 volunteers. The nucleic acids were dotted 
directly or after treatment with RNAse or with DNAse before dotting. 
Double-stranded homologous cDNA (pUCNV-953) was dotted after the 
same treatments as the stool samples. 

Figure 3. Dot blot hybridization of Norwalk viruses in a CsCl gradient 
20 with ssRNA probes made from pGEMNV-953. Aliquots of 50ul from each 
fraction in a CsCl gradient were dotted onto a Zetabind filter. Duplicates 
of filters were made and hybridized with the two ssRNA probes 
respectively. The two strands were subsequently called cRNA (positive 
hybridization with the viral nucleic acid) and vRNA (no hybridization with 
25 the viral nucleic acid, data not shown). The graph shows EM counts of 
Norwalk viruses from each fraction of the same CsCl gradient for the dot 
blot hybridization. Five squares from each grid were counted and the 
average of the number of viral particles per square was calculated. 
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Figure 4. The nucleotide sequence of the genome sense strand of the first 
Norwalk virus cDNA clone. The deduced amino acid sequence of a long 
open reading frame in this cDNA also is shown. 

Figure5. Schematic diagram of Norwalk cDNA clones. pUCNV-953 was 
5 the first positive cone identified. Overlapping clones were determined by 
restriction enzyme analyses and partial sequencing of the clones. AAA 
indicates the poly(a) tail at the 3' end of the viral genome. 

Figure 6. Norwalk virus encodes an RNA-directed RNA polymerase 
sequence motif. The deduced amino acid sequence of a portion of Norwalk 

10 virus pUCNV-4095 (NV) is compared with consensus amino acid residues 
thought to encode putative RNA-directed RNA polymerases of hepatitis 
E virus (HEV), hepatitis C virus (HCV), hepatitis A virus (HAV), Japanese 
encephalitis virus (JE), poliovirus (polio), foot-and-mouth disease virus 
(FMD), encephalomyocarditis virus (EMC), Sindbis virus (SNBV), tobacco 

15 mosaic virus (TMV), alfalfa mosaic virus (AMV), brome mosaic virus 
(BMV), and cowpea mosaic virus (CpMV). Sequences for viruses other 
than NV are from Figure 3 of Reyes et al . Science 247:1335-1339 (1990). 

Figure 7. Three pairs of initial primers used to amplify the Norwalk virus 
genome. RNA was extracted from a stool sample (sample 543-11) by the 

20 CTAB technique and amplified by RT-PCR Lanes 1 and 5, 1-kb markers 
from Bethesda Research Laboratories (the markers that migrated as 1.6, 
1.0 and 0.5 kb are labeled); lane 2, PCR with Norwalk virus primers 8 and 
9; lane 3, PCR with Norwalk primers 16 and 17; lane 4, PCR with 
Norwalk primers 1 and 4. The amplified products were separated on the 

25 agarose gel and visualized with UV light after staining with ethidium 
bromide. The small product seen in lane 3 was made in variable amount 
in different experiments. The positions of the three primer pairs used in 
this study are given above the autoradiograph. The numbers below the 
map indicate the size (in base pairs) of the RT-PCR product. 
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Figure 8. This schematic shows the organization of Norwalk genome 
given in Table 2. The features shown here are based on analyses of the 
nucleotide sequence of the Norwalk virus genome and the deduced amino 
acid sequence of proteins encoded in the genome. The genome contains 
5 7753 nucleotides including 111 A's at the 3'-en& Translation of the 
sequence predicts that the genome encodes three open reading frames 
(shown by the open boxes in the second line). The first open reading 
frame is predicted to start from an initiation codon at nucleotide 146 and 
it extends to nucleotide 5359 (excluding the termination codon). The 

10 second open reading frame is initiated at nucleotide 5346 and it extends 
to nucleotide 6935, and a third open reading frame exists between 
nucleotides 6938 and 7573. Based on comparisons of these predicted 
proteins with other proteins in the protein databank, the first open 
reading frame is a protein that is eventually cleaved to make at least three 

15 proteins. These three proteins include a picorna virus 2C-like protein, a 
3C-like protease and a 3D-like RNA-dependent RNA polymerase. The 
second open reading frame encodes the capsid protein, which contains 
sequence homology with the picornavirus VP3 protein. 

Figure 9. Nucleotide and amino acid sequence of human calicivirus 
20 Sapporo cDNAs. The 551 nucleotide known sequence of human calicivirus 
Sapporo (HuCV Sapporo) is presented in its entirety. Below the 
nucleotide sequence is the amino acid sequence for HuCV Sapporo. Above 
the HuCV Sapporo nucleotide sequence is the sequence of the cDNA from 
a Houston day care center outbreak (Day care). In the Day care sequence 
25 a V indicates the nucleotide is identical to the HuCV Sapporo nucleotide 
at that site. Where a nucleotide difference occurred in the Day care 
sequence, a new letter is indicated at that position. ,f N" indicates 
uncertainty of the nucleotide at that site. Below the HuCV Sapporo 
amino acid sequence are arrows, indicating the extent of cDNAs at23s2m31 
30 and c-29_4-gel (which together contribute to the 551 nucleotides of the 
known sequence) and the new 36 primer (see Table 6). 
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Figure 10. Nucleotide homologies between calicivirus cDNAs and 
calicivirus strains with known sequences. All comparisons are in 
reference to the sequence of human calicivirus Sapporo. The length of the 
baseline indicates the known sequence region. The boxes indicate areas 
5 of nucleotide sequence homology between HuCV Sapporo and the 
indicated strain. The length of the box indicates the part of the indicated 
strain where homology exists and the height of the box indicates the 
strength of the homology. SD = standard deviation. SD 3 or greater is 
significant. The numbers under the Norwalk homology box indicate the 
10 region of the Norwalk virus genome where homology was observed. 

Figure 11. Strategy used to obtain nucleotide sequence of the Norwalk- 
related virus SRSV/KY/89 using primers from the Norwalk virus sequence. 
This figure shows a partial schematic of the Norwalk virus genome and 
the predicted ORF1 showing the location of the 3D-like polymerase region, 
15 the second ORF showing the location of the VP3-like domain and the start 
of ORF 3. On the bottom, the solid lines show regions of KY89 sequenced 
based on using primer sets (see numbers such as 36 and 35, etc) chosen 
from the sequence of the Norwalk virus genome. 

Figure 12. Comparison of the Norwalk virus nucleotide sequence with the 
20 Norwalk virus-related virus SRSV/KY/89 nucleotide sequence. Part of the 
nucleotide sequence of Norwalk-related virus SRSV/KY/89 was determined 
using primers from the Norwalk-virus (NV) genome. Primers from the 
NV genome used to obtain the sequence of this Norwalk-related virus are 
shown in Table 6. Some of these primers were modified based on the 
25 initial nucleotide sequence obtained from the SRSV/KY/89 to obtain the 
rest of the sequence of SRSV/KY/89. The primers shown here and in 
Table 6 are used by way of example only; other NV primers can be used. 

Figure 13. Comparison of deduced amino acid sequence of proteins of the 
Norwalk virus and the Norwalk-related virus SRSV/KY/89. The protein 
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sequence f SRSV/KY/89 was deduced from the nucleotide sequence shown 
in Figure 12. Figure 13a shows a comparison of the deduced amino acid 
sequence of OEF2, the capsid, of SRSV/KY/89 with the same region 
encoded in the Norwalk virus genome. Figure 13b shows a comparison of 
5 the deduced amino acid sequence of part of the polymerase protein of 
SRSV/KY/89 with that of Norwalk virus. Comparisons of similar 
sequences from other Norwalk-related viruses will permit discovery of 
conserved and divergent regions including antigenic regions. The 
information will rapidly permit choices of broadly reactive primers to 
10 detect all Norwalk-related viruses and specific primer sets to detect 
individual Norwalk-related viruses. Similarity, fragments and peptides 
with common amino acid sequences or specific amino acid sequences can 
be selected for development of diagnostics, vaccines and antivirals. 

Figure 14. Comparison of partial nucleotide sequences of Norwalk virus 
15 and six Norwalk-related viruses obtained using primers from the NV 
genome. Sequences from SRSV/CDC 6/91, SRSV/UT/88, SMA/78; 
SRSV/Cambridge, UK/92, SRSV/CDC 32, Norwalk virus/68, SRSV-3/88, 
SRSV/KY89/89. Figures 14a and 14b show two different regions of the 
genome. 

20 Figure 15. Expression of the Norwalk virus capsid protein, Baculovirus 
recombinants (C-6 and C-8) that contain a subgenomic piece of Norwalk 
virus DNA (from nucleotides 5337 to 7753) were selected and used to 
infect insect (Spodoptera fugiperda) cells at a multiplicity of infection of 
10 PFU/cell. After 4 days of incubation at 27° C, the infected cells were 

25 harvested and the proteins were analyzed by electrophoresis on 12% 
polyacrylamide gels. The proteins were visualized after staining with 
Coomassie blue. The Norwalk-expressed protein (highlighted by the 
arrowhead) is only seen in the recombinant-infected cells, but not in wild- 
type baculovirus (wt) or mock-infected (m) insect cells. 
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Figure 16. The Norwalk virus expressed protein shows immunoreactivity 
with sera from volunteers infected with Norwalk virus. The expressed 
protein shown in Figure 11 was absorbed onto the wells of a 96-well 
ELISA plate and its reactivity was tested with dilutions of serum samples 
5 taken from volunteers before (pre) and three weeks after (post) infection 
with Norwalk virus. After an incubation at 37° C for 2 hours, a 
peroxidase-conjugated goat-anti-human IgG, IgM and IgA serum was 
added and reactivity was subsequently observed by reading the optical 
density at 414nm after addition of the substrate. The data show that 
10 post-infection sera reacted strongly with the expressed antigen at serum 
dilutions of 1:100 and 1:1000, and some sera were still specifically reactive 
at a dilution of 1:10,000. 

Figure 17. Baculovirus recombinants containing the 3*-end of the 
Norwalk genome produce virus-like particles in insect cells. Lysates from 
15 insect cells infected with baculovirus recombinant C-8 (see Figure 11) were 
analyzed by electron microscopy and shown to contain numerous virus-like 
particles. These particles are the same size as virus particles obtained 
from the stools of volunteers infected with Norwalk virus. Bar = 50 nm. 

Figure 18. Norwalk virus-like particles can be purified in gradients of 
20 CsCl. Supernatants of insect cells infected with the baculovirus 
recombinant C-8 were processed by extraction with genetron and PEG 
precipitation and virus eluted from these PEG pellets was centrifuged in 
CsCl gradient in a SW50.1 rotor for 24 hours at 4°C. The gradient was 
fractionated and material in each fraction was adsorbed onto two wells of 
25 an ELISA plate. Duplicate wells were then treated either with pre- or 
post-infection serum, peroxidase-conjugated goat anti-human serum and 
substrate and the reactions were monitored by reading the OD414nm. A 
peak was observed in the gradient at a density of 1.31 g/cm 3 and this peak 
was shown to contain virus-like particles by electron microscopy. This 
30 peak also contained a major protein of an approximate molecular weight 
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of 58,500 that co-migrated with the protein expressed in the insect cells 
from the same baculovirus recombinant. 

Figure 19. Use of the expressed virus-like particles to measure the 
reactivity of pre- and post-serum samples from volunteers infected with 
5 Norwalk virus shows that most volunteers have an immune response. 
Volunteer 6 who did not show an immune response also did not become 
ill after being administered virus. 

Figure 20. Partial sequence of the primate Pan paniscus cDNA atprcvw2. 

Detailed Description of the Invention 

10 It is readily apparent to one skilled in the art that various 

substitutions and modifications may be made to the invention disclosed 
herein without departing from the scope and spirit of the invention. 

The term "fragment" as used herein is defined as any portion of the 
Norwalk virus genome or a subgenomic clone of the Norwalk virus that 

15 is required to be expressed to produce or encodes a peptide which in turn 
is able to induce a polyclonal or monoclonal antibody. It is possible a 
peptide of only 5 amino acids could be immunogenic but usually peptides 
of 15 amino acids or longer are required. This depends on the properties 
of the peptide and it cannot be predicted in advance. 

20 The term "derivative" as used herein is defined as larger pieces of 

DNA or an additional cDNA which represents the Norwalk virus genome 
and which is detected by direct or sequential use of the original cDN A and 
any deduced amino acid sequences thereof. Clone pUCNV-1011, therefore, 
is a derivative, although it does not overlap or share sequences with the 

25 original clone. Also included within the definition of derivative are RNA 
counterparts of DNA fragments and DNA or cDNA fragments in which 
one or more bases have been substituted or to which labels and end 
structures have been added without affecting the reading or expression of 
the DNA or cDNA. 



WO 94/05700 



PCT/US93/08447 



12 

The terms Norwalk "related viruses" and "Norwalk-like viruses" as 
used herein are defined as human and non-human calicivirus, astrovirus 
and small round structured viruses (SRSV). As the genomic sequences of 
most of these viruses are not known, this classification is based on 
5 morphology as described by Caul and Appleton in the Journal of Medical 
Virology, 9:257-265 (1982); by Appleton in the article entitled "Small 
round viruses: classification and role in food-borne infections", in the book 
Novel Diarrhoea Viruses. Ciba Foundation Symposium No. 128, pp. 108- 
125 (John Wiley & Sons, N.Y. (1987)); and by Kapikian and Chanock in 
10 the chapter entitled "Norwalk group of viruses" from the book Virology 
(B.N.Fields, 2d ed., Raven Press (1990)). As the genomic sequences of the 
viruses become known, those skilled in the art will be able to determine 
Norwalk-related viruses and Norwalk-like viruses based on nucleotide 
homologies. 

15 Within the Norwalk-related viruses is a subgroup of viruses referred 

to herein as the SRSVs or the Norwalk group. The Norwalk group 
includes Snow Mountain Agent (SMA), Hawaii Agent, Taunton Agent, 
Amulree, Otofuke, and Montgomery County Agent. The Norwalk group 
is characterized by small, round, structured viruses with an amorphous 

20 surface or ragged outline. 

Production of Norwalk Virus for Molecular Cloning 

Norwalk virus was produced by administration of safety tested 
Norwalk virus (8FIIa) to adult volunteers. The virus inoculum used in the 
volunteer study, was kindly supplied by Dr. Albert Kapikian (Laboratory 

25 of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, 
MD). This virus originated from an outbreak of acute gastroenteritis in 
Norwalk, Ohio (Dolin et al., 1971). Two ml of a 1 to 100 dilution of 8FIIa 
in TBS was administered orally to each individual with 80 ml of milli-Q 
water (Millipore, Bedford, MA 01730). Sodium bicarbonate solution was 

30 taken by each person 2 minutes before and 5 minutes after virus 
administration. The volunteer studies were approved by the Institutional 
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Review Board for Human Research at Baylor College of Medicine, at the 
Methodist Hospital and at the General Clinical Research Center, The 
virus was administered to the volunteers in the General Clinical Research 
Center where the volunteers were hospitalized and under extensive 
5 medical care for 4 days. All stools were collected and kept at -70°C for 
later use. 

Purification of N nrwalk Viruses from Stool Samples 

A 10% solution of stool samples in TBS was clarified by low speed 
centrifugation at 3000 rpm for 15 minutes. The resulting supernate then 

10 was extracted two to three times with genetron in the presence of 0.5% 
Zwittergent 3-14 detergent (Calbiochem Corp., La Jolla, CA). Viruses in 
the aqueous phase were concentrated by pelleting at 36,000 rpm for 90 
minutes through a 40% sucrose cushion in a 50.2 Ti rotor (Beckman 
Instruments, Inc., Palo Alto, CA 94304). The pellets were suspended in 

15 TBS and mixed with CsCl solution (refractive index 1.368) and centrifiiged 
at about 35,000 rpm for about 24 hours in a SW50.1 rotor (Beckman). 
The CsCl gradient was fractionated by bottom puncture and each fraction 
was monitored for virus by EM examination. The peak fractions 
containing Norwalk virus were pooled and CsCl in the samples was 

20 diluted with TBS and removed by pelleting the viruses at about 35,000 
rpm for 1 hour. The purified virus was stored at about -70°C. 

Extraction of Nucleic Acids from Purified Virus 

One method of extraction involved treating purified Norwalk virus 
from CsCl gradients with proteinase K (400 ug/ml) in proteinase K buffer 

25 (0.1 M Tris-Cl pH 7.5, 12.5 mM EDTA, 0.15 M NaCl, 1% w/v SDS) at 
about 37°C for about 30 minutes. The samples were then extracted once 
with phenol-chloroform and once with chloroform. Nucleic acids in the 
aqueous phase were concentrated by precipitation with 2.5 volumes of 
ethanol in the presence of 0.2 M NaOAc followed by pelleting for 15 

30 minutes in a microcentrifuge. 
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cDNA Synthesis and Cloning of Amplified of cDNA 

One method of synthesis and cloning included denaturing nucleic 
acids extracted from the purified Norwalk viruses with 10 mM CH s HgOH. 
Then cDNA was synthesized using the cDNA synthesis kit with the 
5 supplied random hexanucleotide primer (Amersham, Arlington Heights, 
IL 60005). After the second strand synthesis, the reaction mixture was 
extracted once with phenol-chloroform and once with chloroform followed 
by ethanol precipitation. Amplification of DNA was performed using the 
random prime kit for DNA labeling (Promega Corp., Madison, WI 
10 53711-5305). Eight cycles of denaturation (100°C for 2 minutes), 
reannealing (2 minutes cooling to room temperature) and elongation 
(room temperature for 30 minutes) were performed after addition of 
Klenow fragment (Promega Corp.). A DNA library was constructed in 
pUC-13 with blunt-end ligation into the Sma I site. 

15 Screening of the Library for Positive Clones 

As one method of screening, white colonies from transformed DH5 
alpha bacterial cells (BRL) were picked and both a master plate and 
minipreps of plasmid DNA were prepared for each clone. Clones 
containing inserts were identified after electrophoresis of the plasmid 

20 DNA in an agarose gel. The insert DNA in the agarose gel was cut out 
and labeled with "P using random primers and Klenow DNA polymerase 
such as in the PRIME-A-GENE® labeling system (Promega Corp.). Other 
isotopic or biochemical labels, such as enzymes, and fluorescent, 
chemiluminescent or bioluminescent substrates can also be used. Nucleic 

25 acids extracted from paired stool samples (before and after Norwalk 
infection) from two volunteers (543 and 544) were dotted onto Zetabind 
filters (AFM, Cuno, Meiiden, CT). Replicate filter strips were prepared 
and hybridized with each labeled plasmid probe individually at 65°C 
without formamide. Potential positive clones were judged by their 

30 different reactions with the pre- and post-infection stools. Clones which 
r acted with post (but not pre-) infection stools of volunteers were 
considered positive and these clones on the master plates were 
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characterized further. Once one Norwalk clone was identified, it was used 
to rescreen the cDNA library to identify additional overlapping clones. 
Rescreening the cDNA library with these additional clones can ultimately 
identify clones representing the entire Norwalk virus genome. 

5 Reverse Transcriptase-Polymerase Chain Reaction Production of cDNA 
Clones from Viruses Related to Jfo rwalk Vims 

One method for producing cDNA clones of viruses related to 
Norwalk virus using the knowledge of the Norwalk virus genome sequence 
is the reverse transcription-polymerase chain reaction method. In this 

10 procedure, RNA was extracted from 300 uL of specimen containing the 
related virus. Complementary DNA was prepared by reverse 
transcriptase-polymerase chain reaction (RT-PCR) using a primer pair (for 
example primers 36 and 35 shown in Table 6) derived from the sequence 
of Norwalk virus. The resulting product was ligated into a plasmid vector 

15 and transfected into E. colt Plasmids then were partially purified from 
the bacteria and the inserted PCR product was sequenced in the plasmid 
by dideoxy chain termination to examine the relation to Norwalk virus by 
nucleotide and predicted protein homology. 

The following examples are offered by way of illustration and are not 

20 intended to limit the invention in any manner. 

Example 1 
Electron micrograph confirmation 
To permit better diagnosis and molecular characterization of 
Norwalk virus and related viruses, a cDNA library for Norwalk was 
25 derived from nucleic acid extracted from virions purified from stool 
samples. Norwalk virus was purified with methods used previously for 
hepatitis A and rotaviruses from stool samples with some modifications 
(Jiang et al., 1986). Basically, stool samples obtained from volunteers 
administered Norwalk virus were treated with genetron to remove lipid 
30 and water insolubl materials. Virus in the aqueous phase was then 
pelleted through a 40% sucrose cushion. The resulting pellets were 



WOW/05700 



PCT/US93/08447 



16 

resuspended, sonicated and loaded in a CsCl gradient for isopycnic 
centrifugation. 

Figure 1 shows an electron micrographs of purified Norwalk viruses 
isolated by the above procedure and Norwalk-related viruses used to 
5 produce cDNAs using RT-PCR. 

Example 2 

Initial cDNA synthesis, cloning and screening 
A cDNA library was generated from nucleic acids extracted from 
these purified viruses by proteinase K treatment of the samples followed 

10 by phenol-chloroform extraction and ethanol precipitation (Jiang et aL, 
1986; 1987). Because the nature of the viral genome was unknown, the 
extracted nucleic acids were denatured with methylmercuric hydroxide 
before cDNA synthesis. Random primed cDNA was synthesized with the 
Gubler-Hoffinan method (cDNA synthesis system plus, Amersham) and a 

15 small amount of cDNA was obtained. Direct cloning of this small amount 
of cDNA was unsuccessful. Therefore, a step of amplification of the DNA 
was performed by synthesizing more copies of the DNA with random 
primers and the Klenow fragment of DNA polymerase before cloning. The 
procedure involved cycles of denaturation, addition of random primers and 

20 the Klenow fragment of DNA polymerase, reannealing and elongation. 
With this procedure, a linear incorporation of labeled nucleotides into 
product was observed as the number of cycles of synthesis was increased. 
The number of cycles performed was limited (<10) to avoid the synthesis 
of an excess of smaller fragments. In the case of Norwalk cDNA, eight 

25 cycles of amplification were performed and approximately 2.5 ug of DNA 
were obtained, which was at least a 100-fold amplification of the starting 
template cDNA. This amplified cDNA was cloned into pUC-13 by 
blunt>end ligation and a positive clone (pUCNV-953) was isolated. 

To obtain the positive Norwalk virus clone, minipreparations of the 

30 plasmid DNAs containing potential inserts were screened by agarose gel 
electrophoresis. Inserts of the larger clones in the gel were cut out and 
probes were made with the DNA in the gel using the PMME-A-GENE© 
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labeling system (Promega Corp.). These probes were hybridized 
individually with paired stool samples (before and after Norwalk infection) 
from two volunteers (Figure 2a). One clone (pUCNV-953) reacted with 
post- but not pre-infection stool samples from both volunteers. 

5 Example 3 

Confirmation of viral origin of the clone pUCNV-953 
To further confirm the viral origin of the clone pUCNV-953, six 
more paired stool samples were tested and the same results were obtained. 
Figure 2b shows a dot blot hybridization of the clone with stool samples 

10 collected at different times post-infection of the disease. Strong signals 
were observed only with stools from acute phase, but not before and after 
the illness. This result was consistent with previous RIA assays for viral 
antigen detection using convalescent sera from volunteers with Norwalk 
diarrhea and immune electron microscopy (IEM) studies of the samples 

15 for viral particle examination. This result also agrees with the patterns 
of virus shedding in stool in the course of the disease (Thornhill et al., 
1975 ), When the pUCNV-953 clone was hybridized with fractions of a 
CsCl gradient from the Norwalk virus purification scheme, an excellent 
correlation between hybridization and EM viral particle counts was 

20 observed (Figure 3). The peaks of the hybridization signals and viral 
particle counts both were at fractions with a density of 1*38 g/cm s , which 
agrees with previous reports of the biophysical properties of Norwalk 
virus. Finally, the clone was tested by hybridization with highly purified 
Norwalk virus electrophoresed on an agarose gel. A single hybridization 

25 band was observed with Norwalk virus but not with HAV and rotavirus. 
Sequence analysis of the pUCNV-953 cDNA showed this clone is 511 bp 
(Figure 4). This partial genomic cDNA encodes a potential open reading 
frame for which the amino acid sequence has been deduced (Figure 4). No 
significant nucleotide or deduced amino acid sequence homology was 

30 found by comparison with other sequences in the Gen Bank (Molecular 
Biology Information Resource, Eugene Software, Baylor College of 
Medicine). 
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Example 4 

TTbb nf Norwalk vinia cDNA characterise the viral genome 
The pUCNV-953 cDNA was subcloned into the transcription vector 
pGEM-3Zf(+) and grown. ssRNA probes were then generated by in vitro 

5 transcription using SP6 and T7 polymerases (Bethesda Research 
Laboratory). When two opposite sense ssRNA probes were hybridized 
with the viral nucleic acid separately, only one strand reacted with the 
virus, indicating the viral genome is single-stranded. As shown in Figure 
2b, the hybridization signals were removed by treatment of the viral 

10 nucleic acid with RNAse (but not with DNAse) before loading them onto 
the filters, indicating the virus genome contains ssRNA. A long open 
reading frame was found in one of the two strands of the inserted DNA 
by the computer analysis of the sequences of pUCNV-953. The ssRNA 
probe with the same sequence as this coding strand does not react with 

15 the viral nucleic acid, but the complementary ssRNA probe does react in 
the hybridization tests. Therefore, Norwalk virus contains a positive 
sense single-stranded RNA genome. The size of the genome of Norwalk 
virus was estimated to be about 8 kb based on comparisons of the 
migration rate of the purified viral RNA in agarose gels with molecular 

20 weight markers. 

The pUCNV-953 cDNA was used to rescreen a second cDNA library 
made as follows. A clone of the Norwalk or related virus was synthesized 
by isolating nucleic acid from purified Norwalk virus; cDNA was 
synthesized using reverse transcriptase and random primers; a second 

25 strand of DNA was synthesized from the cDNA; and at least one copy of 
DNA was inserted into a plasmid or a cloning and expression vector; and 
screening the library with the original puCNV-953 cDNA identified clones 
containing fragments of (or the complete) Norwalk or related genome. 
Alternatively at least one copy of DNA was inserted in a cloning and 

30 expression vector, such as lambda ZAPII® (Stratagene Inc.), and the cDNA 
library was screened to identify recombinant phage containing fragments 
of or the complete Norwalk or related genome. Additional cDNAs were 
made and found with this method. Use of these additional cDNAs to 
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made and found with this method. Use of these additional cDNAs to 
rescreen the library resulted in detection of new cl nes (Figure 5). 

Thus, those skilled in the art will recognize that entire Norwalk 
virus cDNA sequence, or fragments or derivatives thereof, can be used in 
5 assays to detect the genome of Norwalk and other related viruses. The 
detection assays include labeled cDNA or ssRNA probes for direct 
detection of the Norwalk virus genome and measurement of the amount 
of probe binding. Alternatively, primers or small oligonucleotide probes 
(10 nucleotides or greater) and polymerase chain reaction amplification 

10 are used to detect the Norwalk and Norwalk-related virus genomes. 
Expression of the open reading frame in the cDNA is used to make 
hyperimmune or monoclonal antibodies for use in diagnostic products, 
vaccines and antivirals. 

Using the above methodology, the nucleotide sequence in Table 2 

15 was identified. Within that nucleotide sequence, the encoding regions for 
several proteins have been identified. In that sequence, the first protein 
is encoded by nucleotides 146 through 5339 and the amino acid sequence 
is shown in Table 3. This first protein is eventually cleaved to make at 
least three proteins including a picornavirus 2C-like protein, a 3C-like 

20 protease and an RNA-dependent RNA polymerase. The RNA-dependent 
RNA polymerase is deduced from nucleotides 4543 to 4924 of the Norwalk 
virus genome as shown in Table 3. The fact that this portion of the 
genome contains an RNA polymerase is verified by comparisons with RNA 
polymerase in other positive sense RNA viruses (Figure 6 SEQ ID NOS 

25 38 through 50). 

Also in the sequence in Table 2, two other protein encoding regions 
were found. They are encoded by nucleotides 5346 through 6935 and 
nucleotides 6938 through 7573. The amino acid sequences for these two 
proteins are shown in Tables 4 and 5, respectively. 
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Example 5 

Diagnostic assays based on detection of the 
sequences of the Norwalk virus genome 
Hybridization assays are the assays of choice to detect Norwalk virus 
5 because small amounts of virus are present in clinical or contaminated 
water and food specimens. Previously, detection of Norwalk and related 
nucleic acids was not possible because the genome of Norwalk virus was 
not known and no sequence information was available. Probes made from 
the Norwalk virus cDNA or primers made from the Norwalk virus genome 

10 sequence allow methods to amplify the genome for diagnostic products to 
be established. Probes to identify Norwalk virus alone and to identify 
other Norwalk-related viruses enable development of either specific assays 
for Norwalk or general assays to detect sequences common to many or all 
of the Norwalk-related agents. 

15 In the past, one major difficulty encountered in RT-PCR detection 

of viral RNA in stool samples was that uncharacterized factor(s) are 
present in stools which inhibit the enzymatic activity of both reverse 
transcriptase and Taq polymerase (Wilde et aL, J. Clin. Microbiol. 
28:1300-1307, 1990). These factor(s) were difficult to remove by routine 

20 methods of nucleic acid extraction. Techniques were developed using 
cetyltrimethylammonium bromide (CTAB) and oligo d(T) cellulose 
specifically to separate viral RNA from the inhibitory factor(s). These 
techniques were based on the unique properties of CTAB which selectively 
precipitates nucleic acid while leaving acid insoluble polysaccharide in the 

25 supernatant. The resulting nucleic acid was further purified by adsorption 
onto and elution from oligo d(T) cellulose. This step removes unrelated 
nucleic acids that lack a polyCA) tail. With this technique, Norwalk virus 
was detected easily by PGR in very small amounts (400 ul of a 10% 
suspension) of stool sample. For example, one skilled in the art will 

30 recognize that it is now possible to clone the genome of RNA viruses 
present in low concentrations in small amounts of stool after RT-PCR and 
a step of amplification of the viral RNA by RT-PCR using random 
primers. In some cases, RT-PCR active nucleic acids are extracted with 
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CTAB and without oligo d(T) cellulose. In addition, now that the 
inhibitor(s) can be removed from stool, it is also possible to detect and 
clone nucleic acids of other viruses (DNA viruses, non-poly(A) tailed RNA 
viruses) present in stool 
5 The CTAB and oligo d(T) cellulose technique of extraction followed 

by detection of viral RNA with RT-PCR was used on stool samples and 
could be used on water and food samples. Stool sample was suspended in 
distilled water (about 10% wt/vol) and extracted once with genetron. 
Viruses in the supernatant were precipitated with polyethylene glycol at 

10 a final concentration of about 8%. The viral pellets were treated with 
proteinase K (about 400 ug/ml) in the presence of SDS at about 37°C for 
about 30 minutes followed by one extraction with phenol chloroform and 
one with chloroform. A solution of about 5% CTAB and about 0.4M NaCl 
was added at a ratio of sample :CTAB equal to about 5:2. After incubation 

15 at about room temperature for about 15 minutes and at about 45<>C for 
about 5 minutes, the nucleic acids (including the viral RNA) were collected 
by centrifugation in a microcentrifuge for about 30 minutes. The 
resultant pellets were suspended in about 1M NaCl and extracted twice 
with chloroform. The viral RNA in the aqueous phase was used directly 

20 in RT-PCR reactions or further purified by adsorption/elution on oligo 
d(T) cellulose. 

A batch method of adsorption/elution on oligo d(T) cellulose was 
used to purify polyCA) tailed RNA. In this procedure, nucleic acids 
partially purified as described above or RNA extracted directly with 

25 phenol chloroform (without CTAB treatment) were mixed with oligo d(T) 
cellulose (about 2-4mg/sample) in a binding buffer (about 0.5M NaCl and 
lOmM Tris, pH 7.5). The mixture was incubated at about 4°C for about 
1 hr with gentle shaking and then centrifuged for about 2 minutes in a 
microcentrifuge. The oligo d(T) cellulose pellet was washed 3-4 times with 

30 binding buffer and then the poly(A) tailed RNA was eluted with IX TE 
buffer (about lOmM Tris, ImM EDTA, pH 7.5). The supernate was 
collected following centrifugation to remove the oligo d(T) cellulose and 
the viral RNA in the supernate was precipitated with ethanoL The RNA 
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obtained at this stage was basically inhibitor-free and able to be used in 
RT-PCR. 

In preliminary experiments, Norwalk virus RNA was detected in less 
than 0.05g of stool samples using the CTAB technique. A trace inhibitor 
5 activity was observed with RNA extracted with either CTAB or oligo d(T) 
alone, but this was easily removed by dilution (1:2) of the extracted 
nucleic acid before RT-PCR. Combination of the CTAB and oligo d(T) 
techniques resulted in obtaining high quality, inhibitor free RNA which 
could be used directly for RT-PCR detection and for cloning of the viral 
10 genome. With development of this method to clone from small amounts 
of stool, one skilled in the art will know that they can obtain cDNAs for 
the remainder of the genome including those representing the 5'-end of 
the genome. 

For detection with PCR, primers based on the above nucleotide 

15 sequence of the genome were made by chemical methods. These primers 
include: Primer 1: CACGCGGAGGCTCTCAAT located at nucleotides 
7448 to 7465; Primer 4: GGTGGCGAAGCGGCCCTC located at 
nucleotides 7010 to 7027; Primer 8: TCAGCAGTTATAGATATG located 
at nucleotides 1409 to 1426; Primer 9: ATGCTATATACATAGGTC 

20 located at nucleotides 612 to 629; Primer 16: CAACAGGTACTACGTGAC 
located at nucleotides 4010 to 4027; and Primer 17: 
TGTGGCCCAAGATTTGCT located at nucleotides 4654 to 4671 (SEQ ID 
NOS 51 through 56, respectively). These primers have been shown to be 
useful to detect virus using reverse transcription and polymerase chain 

25 reaction methods (RT-PCR). Figure 7 shows data using these primers. 
In primer sets 1 and 4, 8 and 9, and 16 and 17, the reverse compliments 
for the sequences given above for primers 1, 8, and 17 were used. 

New, additional primer sets (Table 6 and SEQ ID NOS.: 15 to 37) 
are used as probes to detect the Norwalk-related viruses. Table 7 shows 

30 the ability of newly selected primer sets 36-35, 69-39, 78-80 to detect many 
Norwalk-related viruses. These results are additional examples of the use 
of primer sets from the original Norwalk virus sequence to detect 
Norwalk-related viruses. Nucleotide sequence data of many of these 
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viruses indicates that there is a continuum of genetic relatedness within 
the RNA region described by primer sets 36-35 or 69-39 of these different 
viruses (from 87% to 0%), yet these different agents can be detected using 
primers from the Norwalk virus genome sequence. The sequence of 2516 
5 nucleotides of another small round structured virus (SRSV/KY/89 SEQ ID 
NO:12) also was obtained by using a total of 8 additional sets of primers 
from the original Norwalk virus sequence (primers 56 and 23, 42 and 55, 
58 and 59, 60 and 61, 72 and 63, 76 and 77, 64 and 75, and 74 and 3; 
Table 6). 

1G Example 6 

Preparation of polyclonal antibodies 
and monoclonal antibodies to Norwalk virus proteins 
Protein(s) encoded in the cDNA fragments or derivatives thereof, is 
produced in a prokaryotic or eukaiyotic expression system and used to 

15 immunize animals to produce polyclonal antibodies for diagnostic assays. 
Prokaryotic hosts may include Gram negative as well as Gram positive 
bacteria, such as E. coli. S. tymphimurium. Serratia marcescens. and 
Bacillus subtilis . Eukaiyotic hosts may include yeast, insect or 
mammalian cells. Immunized animals may include mammals such as 

20 guinea pigs, mice, rabbits, cows, goats or horses or other non-mammalian 
or non-murine species such as chickens. Repeated immunization of these 
animals with the expressed protein mixed with an adjuvant such as 
Freund adjuvant to enhance stimulation of an immune response produces 
antibodies to the protein. 

25 Alternatively, synthetic peptides of greater than 15 amino acids 

made to match the amino acid sequence deduced from the partial cDNA 
sequence (or from other sequences determined by sequencing additional 
cDNAs detected with the original or other clones) are linked to a carrier 
protein such as bovine serum albumin or lysozyme or cross-linked with 

30 treatment with glutaraldehyde and used to immunize animals to produce 
polyclonal antibodies for diagnostic tests. 
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The serum of animals immunized with either the expressed protein 
or with synthetic peptides are tested by immunologic assays such as 
immune electron microscopy, Western blots (immunoblots) and blocking 
ELISAs to demonstrate that antibodies to Norwalk and related viruses 
5 have been made. Reactivities with the expressed protein or synthetic 
peptides show specificity of the polyclonal sera. Reactivities with other 
viruses in the Norwalk group (Snow Mountain Agent, Hawaii Agent, 
Taunton Agent, etc.) indicate production of a reagent which recognizes 
cross-reacting epitopes. 

10 Balb\c mice iiyected with the immunogens as described above and 

shown to have produced polyclonal antibodies are boosted with 
immunogen and then sacrificed. Their spleens are removed for fusion of 
splenocytes with myeloma cells to produce hybridomas, Hybridomas 
resulting from this fusion are screened for their reactivity with the 

15 expressed protein, the peptide and virus particles to select cells producing 
monoclonal antibodies to Norwalk virus. Screening of such hybridomas 
with Norwalk-related viruses permits identification of hybridomas 
secreting monoclonal antibodies to these viruses as well. 

Development of Diagnostic Assays 

20 Analysis of the deduced amino acid sequence of the Norwalk virus 

genome has shown that the Norwalk virus has the genetic organization 
shown in Figure 8. Expression of regions of this genome in cell-free 
translation systems and in the baculovirus expression system have shown 
that the 5*-end of the genome encodes nonstructural proteins and the 3'- 

25 end of the genome encodes at least one structural protein. Based on this 
information, one can express the complete genome or subgenomic regions 
of the genome to produce diagnostic assays to detect viral antigens or 
immune responses to specific regions of the genome. This information can 
be used to detect the Norwalk virus, antigens or immune responses to 

30 Norwalk virus. This information also can be used to detect other similar 
currently uncharacterized viruses that cause gastroenteritis or possibly 
other diseases. Some of these viruses will be in the Caliciviridae or in the 
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picorna virus superfamily. All of these viruses will have matching or 
similar genomic regions in their DNA sequences. 

The availability of cDNA clones from viruses related to Norwalk 
virus enables the production of new antibodies and antisera for diagnostic 
5 assays for these related viruses. For example, availability of cDNA clones 
from caliciviruses which cannot be cultivated permits the expression of 
protein products of those clones. The protein products are used to develop 
new antibodies and antisera. In addition, genetic engineering is used to 
combine the cDNAs from viruses related to Norwalk virus with the cDNAs 

10 from Norwalk virus to produce chimeric proteins, such that part of the 
protein produced is derived from Norwalk virus genome sequence and 
another part of the protein is derived from the genome sequence of a virus 
related to Norwalk virus. These chimeric proteins are then used to 
produce diagnostic reagents, vaccines and antivirals. Examples of the 

15 diagnostic assays are shown in the specific examples and figures below. 

Example 7 

Development of diagnostic assays to detect nucleic acids 
of Norwalk virus or Norwalk-related viruses by detection 
of specific regions of the viral genomes 
20 based on an understanding of the Norwalk virus genome. 

The genetic organization of the Norwalk virus genome allows the 
prediction of specific regions of the gene sequence as regions where 
oligonucleotide primers or probes can be developed to detect Norwalk 
virus sequences and common sequences of other related or similar viruses. 
25 Some of these common genome sequences are found in viruses in the 
Caliciviridae or in the picornavirus superfamily. The detection can be 
done by standard PCR, hybridization or other gene amplification methods. 

Two primers, named 35 (CTT GTT GGT TTG AGG CCA TAT, 
complementary to nt 4944-4924 in the Norwalk virus genome, SEQ ID 
30 NO: 15) and 36 (ATA AAA GTT GGC ATG AAC A, nt 4475-4493 in the 
Norwalk virus genome, SEQ ID NO: 16), were chosen from the region 
likely to encode the Norwalk virus RNA polymerase. These primers then 
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were used to prepare a cDNA clone by reverse transcriptase-PCR from the 
nucleotide sequence of human caUcivirus Sapporo strain (HuCV Sapporo), 
1982 outbreak (Figure 9, SEQ ID NOS). The resulting sequence was 
compared to that of Norwalk virus and of feline and rabbit cahciviruses 
5 available from Genbank. The first cDNA clone from Sapporo, named 
"c-29 4-gel", determined to contain calicivirus sequence was 488 
nucleotides long, of which 40 nucleotides were contributed by primers 36 
and 35, leaving 448 nucleotides unique to human caUcivirus Sapporo. The 
sequence of clone c-29_4-gel between primers 36 and 35 also is shown in 
10 Figure 9, SEQ ID NO:8. 

Evidence that the HuCV Sapporo cDNA clone was correct is shown 
by five facta. First, the sequence exhibits strong homology with Norwalk 
virus, feline caUcivirus, and the rabbit caUcivirus at the nucleotide and 
amino acid levels. (See Figure 10 and Tables 7 and 8). Second, the 
15 sequence contains a continuous protein encoding region on the positive 
strand. In Norwalk, feline, and rabbit cahciviruses continuous protein 
encoding regions also are found in the region of homology. Third, the 
sequence contains the amino acid motif YGDD, which is a marker for 
UNA virus proteins which have RNA-dependent-RNA-polymerase activity. 
20 In c-29_4-gel, the YGDD motif is at the predicted distance from the ends 
of the sequence. Fourth, the same cDNA product was obtained from six 
different stool specimens. Fifth, no significant homologies were found for 
other sequences in the Genbank. 

The nucleotide sequence of c-29_4-gel was used to synthesize an 
25 internal primer. This internal primer was used to prepare a second set of 
RT-PCR products from human caUcivirus Sapporo RNA. A number of 
new cDNA clones were obtained of which one, named "at23s2m3r, 
contains overlapping sequence which is 5* on the virus genome from that 
contained in c-29_4-gel. Sequence at23s2m31 is 149 nucleotides long 
30 (SEQ ID NO:7) and overlaps. c-29_4-gel by 46 nucleotides. See Figure 9 
for at23s2m31 sequence and area of overlap with c-29_4-gel. The 
resulting combined sequence information of c-29_4-gel and at23s2m31 is 
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551 nucleotides in length, excluding the portion c-29_4-gel contributed by 
prime 35, 

Although the human calicivirus Sapporo sequence was generated 
from knowledge of the Norwalk virus sequence, the former is 
5 distinguishable in the same region (see Table 8 or Figure 9). The known 
sequence of human calicivirus Sapporo indicates that this virus is more 
closely related to the animal caliciviruses than to Norwalk virus. 

In May, 1987, a child in Houston was infected with a virus which 
was identified as a calicivirus based on its morphology* Samples 

10 containing virus particles from this child failed to react in serologic assays 
developed for the detection of Norwalk virus and human calicivirus 
Sapporo. Primers 36 and 35 were used to prepare cDNA from the viral 
genome of this strain using RT-PCR. The resulting cDNA product, called 
4847 complete, is 434 nucleotides long, excluding the primers, and is 

15 distinguishable from that of Norwalk virus and human calicivirus 
Sapporo. (See "Houston" in homology comparison in Figure 10; Table 10 
and SEQ ID NO:10). Evidence that this Houston cDNA is correct is the 
same as that listed for c-29_4-gel above, except that homology with 
Norwalk virus and human calicivirus Sapporo is not statistically 

20 significant. 

Use of the sequence from the human cahcivirus 
Sa pporo strain to produce an amplification primers 
for human calicivirus Sapporo and related agents 
The known sequence of human calicivirus Sapporo overlaps one of 
25 the two primers, called primer 36 (see Table 6), used for the initial 
amplification of cDNA clone c-29_4-geI. Examination of the homology of 
known calicivirus sequences (Table 8 SEQ ID NOS 57 through 62) in that 
region indicated that a new 36 primer could be synthesized and used to 
amplify caliciviruses more closely related to human cahcivirus Sapporo 
30 than Norwalk virus. A new primer was synthesized and is called primer 
"new 36" (see Table 6, last line, and SEQ ID NO:37). 
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The new 36 primer was used with primer 35 to generate a cDNA 
clone from a calicivirus which caused a diarrhea outbreak in November, 
1986, in a Houston day care center ("Day care"). The calicivirus strain 
causing this Day care outbreak was antigenically related to human 
5 calicivirus Sapporo but antigenically distinct from Norwalk virus by EIA. 
The Day care cDNA product obtained from the RT-PCR reaction with 
primers new 36 and 35 is 445 nucleotides long, excluding the primers (see 
Figure 9 and SEQ ID NO:9), and has close homology to human calicivirus 
Sapporo and a more distant, yet still significant homology with Norwalk 
10 virus, as shown in Figure 10. Evidence that this Day care cDNA is correct 
is the same as that listed for c-29_4-gel above. 

Use of primers 35 and 36 derived from the Norwalk virus 
sequence to derive a cDNA clone from an animal calicivirus 
A calicivirus was isolated from the mouth of the pygmy chimpanzee, 
15 Pan paniscus. This calicivirus is antigenically distinct from the human 
calicivirus Sapporo strain by EIA. A cDNA was produced from the 
primate calicivirus (PrCV) UNA using RT-PCR and primers 36 and 35. 
The complete nucleotide sequence of this cDNA is not yet available. The 
cDNA, called atprcvw2 (Figure 20; SEQ. ID. NOS 13 and 14), is of the 
20 predicted size and has significant nucleotide homology with human 
calicivirus Sapporo, feline calicivirus(es), and the rabbit calicivirus in the 
region of known sequence. No significant homology with Norwalk virus 
has been observed in the region of known sequence. The known amino 
acid sequence contains the YGDD motif on the positive strand at the 
25 predicted distance from primer 35. 

Use of multiple primers form the N orwalk virus ge nomic se quence to 
detect and characterize KY89. another small round virus associated 
with an outbreak of gastroenteritis. 
The known sequence for Norwalk virus is used to obtain the 
30 sequence of other viruses such as SRSV/KY/89, an agent from a stool from 
an outbreak of gastroenteritis in Japan in 1989. Originally, cDNA 
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products and sequence information were obtained using primer sets 36-35. 
Continued work with another 8 sets of primers (Primers 56 and 23, 42 
and 55, 58 and 59, 60 and 61, 72 and 63, 76 and 77, 64 and 75, and 74 and 
3 in Table 6 and SEQ ID NOS:21 through 36) allowed the SRSV/KY/89 
5 sequence of 2516 nucleotides to be determined (Figures 11 and 12, SEQ 
ID NO:12). This sequence includes the part of the polymerase region and 
the capsid region of the genome. Figures 14 and 6 (SEQ ID NOS 38 
through 50 and 63 through 75) show sequences from other Norwalk- 
related viruses. Continued use of this approach with other Norwalk- 

10 related viruses (such as those shown in Table 7) allows the discovery of 
the complete sequences of multiple Norwalk-related viruses. Those skilled 
in the art will realize that the use of such sequence information and 
expression of fragments and derivatives of Norwalk-related viruses 
permits development of diagnostic assays to detect antibodies, antigens, 

15 viral genetic material or antivirals and to develop vaccines for specific 
Norwalk-related viruses in the same manner that Norwalk virus 
fragments and derivatives have been used. 

Example 8 

Development of diagnostic assays using expressed Norwalk 
20 virus proteins to detect immune responses to Norwalk virus 

Protein(s) encoded in the Norwalk virus genome or fragments or 
derivatives thereof is produced in a prokaryotic or eukaryotic expression 
system and used as antigens in diagnostic assays to detect immune 
responses following virus infections* Prokaryotic hosts may include Gram 
25 negative as well as Gram positive bacteria, such as Escherichia coli, 
Salmonella tymphimurium, Serratia marcescens, Bacillus subtili$> 
Staphylococcus aureus and Streptococcus sanguinis, Eukaryotic hosts 
may include yeast, insect or mammalian cells. Diagnostic assays may 
include many formats such as enzyme-linked immunosorbent assays, 
30 radioimmunoassays, immunoblots or other assays. Figure 15 shows data 
for a capsid protein encoded from the 3'-end of the Norwalk virus genome. 
It is expressed by nucleotides 5337 through 7753 of the DNA sequence 
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shown in Table 2 and Figure 8. This protein has an approximate 
molecular weight of 58,500 and is hereinafter referred to as the 58,500 
mwt protein. It was produced in insect cells infected with baculovirus 
recombinants (C-6 and 08). A band (see arrow in Figure 15) representing 
5 the 58,500 mwt protein in C-6 and C-8 infected cells is not seen in insect 
cells infected with wild-type (WT) baculovirus or in mock infected cells. 
Other proteins encoded by Norwalk virus cDNA or fragments or 
derivatives are similarly expressed using baculovirus recombinants and 
other expression systems. 
10 Figure 16 shows data using the 58,500 mwt protein produced using 

the baculovirus expression system to detect immune responses before and 
after infection of volunteers with Norwalk virus inoculum. Antigen was 
put on ELISA plates and pre- and post-infection human sera were added. 
The data show that when an individual has had the infection, the post- 
15 serum reacts strongly to the antigen. Other proteins encoded in the 
Norwalk virus cDNA or fragments or derivatives thereof are similarly 
used to detect immune responses following Norwalk virus infection. 

Some proteins have the intrinsic property of being able to form 
particles. The 58,500 mwt protein discussed above has that property. 
20 Particles formed from proteins are expressed in any expression system 
and used to produce diagnostic assays based on detection of antibody 
responses or immune responses. Figure 17 shows an electron micrograph 
of particles produced using the baculovirus expression system from 
recombinants containing the 3'-end of the Norwalk genome. These 
25 particles are similar in size to the native virus particles. They are 
antigenic, immunoreactive and immunogenic. They differ from most of 
the virus particles resulting from natural infection in that many of the 
expressed particles lack nucleic acids. The rNV particles are highly 
immunogenic when given parenterally to mice, rabbits and guinea pigs 
30 and when given orally to mice. 

Figure 18 shows data on the properties of rNV particles following 
c ntrifugation in gradients of CsCL The density of the particles 
(symbolised by dosed boses) is 1.31 g/cc which is distinct from the 1.38 
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g/cc density of particles purified from the original infections Norwalk 
inoculum given to volunteers. The gradients were fractionated. Each 
fraction was put on an ELISA plate and human serum was then 
introduced. The open boxes show that there was no ELISA activity with 
5 the pre-infection serum* The closed diamonds show there was reactivity 
with the post-infection serum. Other particles made from other proteins 
encoded in the Norwalk virus cDNA or fragments or derivatives thereof 
are similarly used to detect immune responses following Norwalk virus 
infection. 

10 Figure 19 shows data using purified particles formed by the 58,500 

mwt protein to detect immune responses in post-inoculation (but not pre- 
inoculation) serum samples of 9 volunteers infected with Norwalk virus. 
One of the volunteers, number 6, exhibited no symptoms of Norwalk virus 
infection based on monitoring clinical symptoms or measuring an immune 

15 response. Purified, expressed particles were put on ELISA plates and one 
pre- and one post-infection serum samples from each volunteer was added 
to the particles. The amount of antibody binding to the particles in each 
pre- and post-infection sample was measured. The data in Figure 19 show 
that the expressed proteins form particles that are immunoreactive and 

20 antigenic. Other proteins encoded in the Norwalk virus cDNA or 
fragments or derivatives thereof are similarly used to detect 
immunoreactive and antigenic activity. 

Additional developments of diagnostic assays for the detection of 
Norwalk and Norwalk-related viruses also were pursued. First, new 

25 ELISA assays were made based on utilizing the Norwalk virus capsid 
protein that was engineered to be synthesized from a cDNA fragment that 
was deduced from the Norwalk virus cDNA sequence and then produced 
using the baculovirus expression system. This expressed Norwalk virus 
capsid protein self-assembled into recombinant Norwalk virus particles 

30 (rNV). Two new ELISA assays were established using this rNV antigen. 
One assay detects antiviral antibody and the other detects viral antigen. 
Both the ELISAs are very sensitive when compared to the previous assays 
(based on reagents from human volunteers) available to detect such 
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agents. Further characterization of the antibody ELISA has shown this 
assay detects immune responses following human infections with Norwalk 
virus and a subset of human infections with viruses in the Norwalk group 
such as Snow Mountain and Hawaii agents. In contrast, the antigen 
5 ELISA is based on use of hyperimmune serum made to the baculovirus 
expressed recombinant Norwalk virus particles (rNV). This antigen 
ELISA has been found to be very specific in that is recognizes the 
prototype Norwalk virus (8FIIa) and a subset of closely related agents, but 
not all other viruses in the Norwalk group such as the Snow Mountain 

10 agent and Hawaii agent (See Tables 1 and 7). While the antigen ELISA 
does not detect other viruses in the Norwalk group such as the small 
round structured viruses or caliciviruses, these and other Norwalk-related 
viruses have been able to be detected using primers selected from the 
nucleotide sequence of Norwalk virus (See Table 7). 

15 To develop more broadly reactive diagnostic assays, ELISAs based 

on using other fragments of the Norwalk virus genome were developed. 
The new diagnostic assays are based on detection of antibody responses 
or of antigens deduced from fragments of the Norwalk virus genome other 
than the capsid region. An example and data of this approach is the 

20 following. 

One Norwalk virus nonstructural protein is predicted to be encoded 
in the first ORF of Norwalk viral genome. This ORF is located at the 5 
end of the viral genome and it has a predicated molecular weight of 
190,000 (190K). Whether this ORF 1 is useful in diagnostic assays first 

25 was evaluated by expressing the protein encoded in the full length viral 
RNA, and then synthesizing and testing the immunoreactivity of the 
encoded protein using a cell-free system. This was accomplished by in 
vitro transcription of a full length cDNA (pGNV-F) of the Norwalk viral 
genome cDNAs. This full-length cDNA was constructed by ligation of 

30 subgenomic derivatives of the original Norwalk virus cDNAs shown in the 
physical map in Figure 5. The in vitro synthesized NV mRNAs next were 
examined for their ability to direct the synthesis of a Norwalk virus 
specific protein by cell-free translation in rabbit reticulocyte lysates in the 
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presence of "S methionine to produce a radiolabeled protein. The 
expressed proteins were analyzed by polyacrylamide gel electrophoresis 
(PAGE). A clear band of approximate molecular weight of 130,000 was 
observed in the sample containing the viral RNA but not in the negative 

5 control (without viral RNA). The immunoreactivity of this protein was 
examined by reactivity with pre- and post-infection sera from volunteers 
given Norwalk virus. The 130K protein was precipitated by a 
convalescent serum of a volunteer infected with Norwalk virus, but not by 
serum collected before infection, indicating this protein was virus-specific. 

10 This showed this 130K protein contains some immunoreactive epitopes. 
The apparent smaller size of the protein made in this translation system 
suggested that either the protein migrates aberrantly on gels, or an 
internal initiation codon was used to begin translation or some type of 
post translation^ modification may have occurred after the protein was 

15 translated. 

To further characterize immunoreactive derivatives of the Norwalk 
virus cDNA useful for diagnostic assays, the 2C region of the Norwalk 
viral genome (see Figure 8) was expressed using the baculovirus 
expression system. This region was selected for initial expression because 
20 it is located at the 5'-end of the non-structural protein and a high level of 
conservation was found between the sequence of the predicted Norwalk 
virus protein, and new sequence published for related cahciviruses and 
picornavirus. A 5'-end cDNA fragment of the viral genome was subcloned 
into the baculovirus transfer vector pVL 1393. After co-transfection of 
25 insect Sf9 cells with wild-type baculovirus DNA, recombinants containing 
the Norwalk viral gene were identified and selected. After three rounds 
of plaque purification, radiolabeled lysates of recombinant-infected insect 
cells were prepared, and the radiolabeled proteins were analyzed by 
PAGE. The results showed that a protein of apparent molecular weight 
30 of 57,000 (57K) was made in recombinant-infected but not in uninfected 
cells. The size of the protein suggested that the internal AUG initiation 
codon located at nucleotide 953 was used for making this protein. This 
57K protein also was precipitated by convalescent serum (but not by pre- 
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infection serum) from a volunteer who was infected with Norwalk virus. 
This protein mainly remained cell-associated. One skilled in the art will 
readily see that improvements in the yield and purification of this 2C 
nonstructural protein are possible and will yield more rapid ELISAs to 
5 detect Norwalk and related virus infections. One skilled in the art also 
will see that by expressing proteins from other regions of the Norwalk 
viral genome (e.g., 3C-like> 3D-like and the 3d ORF), diagnostic assays are 
made for Norwalk and related viruses similar to the ELISAs made with 
the 2C nonstructural and rNV structural protein. These new assays 

10 should widen the spectrum in detection of Norwalk-related viruses. 

The initial lack of sensitive methods to detect Norwalk and Norwalk- 
related viruses made the description of the many Norwalk-related viruses 
difficult to define. However, as shown in Table 7, the methods and data 
provided here demonstrate how the discovery of the nucleotide sequence 

15 of the Norwalk virus genome has led to the ability to develop tests to 
detect Norwalk virus and other related agents. The data and methods 
also demonstrate that fragments and derivatives of the Norwalk virus 
genome can be used to provide evidence of and immunity against Norwalk 
and related viruses. 

20 Example 9 

Development of diagnostic assays using expressed 
Norwalk virus and Norwalk-related viruses to detect viral antigens 
Individual proteins, particles or protein aggregates formed from 
expression of one or more Norwalk virus genes in any prokaryotic or 
25 eukaryotic expression system are used as an immunogen or inoculate 
animals to produce polyclonal and monoclonal antibodies for diagnostic 
assays to detect viral antigens. 

Recombinant Norwalk virus particles (rNV) produced using the 
baculovirus expression system has been used to produce polyclonal 
30 antibodies in mice, guinea pigs and rabbits following parenteral 
immunization (see Table 9). Mice given rNV orally also have developed 
serum antibodies. Hybridomas from mice immunised with rNV also have 
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been obtain following fusion with myeloma cells. Use of these antibodies 
in a capture ELISA has shown NV antigen can be detected. This antigen 
ELISA based on the antiserum made to the rNV particles is quite specific 
and it detects only a subset of Norwalk-related viruses (See Table 7). 
5 Therefore, additional capsid antigens from other Norwalk-related viruses 
(such a Snow Mountain, Hawaii etc.) must be expressed to produce a more 
broadly reactive ELISA for capsid antigen. The ELISA is only one format 
that can be used to detect virus antigen. Other formats could include 
immunofluorescence or immunocytochemistry, or immune electron 

10 microscopy. The comparison of the capsid sequences of Norwalk virus and 
Norwalk-related viruses permits the identification of conserved regions of 
the capsid protein and use of fragments of such sequences to immunize 
nnimnlR and can result in the production of antisera with more broad 
reactivity to Norwalk-related viruses. Alternatively, sequential 

15 immunization of flnrnials with expressed proteins of Norwalk and 
Norwalk-related viruses will result in antiserum with the desired broad 
reactivity. Antigen detection assays that are specific to one of a few 
strains of Norwalk and Norwalk-related viruses and additional assays that 
are more broadly reactive each will have use. 

20 Expression of fragments of proteins encoded in other regions of the 

genome can be used to produce antiserum to other proteins for use in 
ELISAs to detect viral antigens. The expression of the first ORF that 
represents a polyprotein encoded in the 5'-end of the genome and 
fragment 2C of the polyprotein has shown that each of these 

25 nonstructural proteins in immunoreactive and antiserum made to these 
can be used to develop diagnostic assays to detect these viral proteins. 
These assays can be broadly reactive and detect many other Norwalk- 
related viruses because of sequence conservation. Those skilled in the art 
will recognize that knowledge of the genome organization of Norwalk 

30 virus permits similar expression of the same regions of the genomes of 
other Norwalk-related viruses for use in diagnostic assays to detect viral 
antigens. 
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Example 10 
Development of a vaccine usinff 
Norwalk virus expressed antigens 
Vaccines for Norwalk virus, the Norwalk group of viruses or other 
5 small round viruses are made from an expressed Norwalk virus protein. 
That expressed protein can be a Norwalk virus capsid protein expressed 
alone or in combination with one or more other Norwalk virus proteins 
or self-forming particles. For example, the particles shown in Figure 17 
were produced using the baculovirus expression system. They are used as 
10 a vaccine when expressed alone or in combination with one or more other 
Norwalk virus proteins. Similarly, the other proteins encoded in the 
Norwalk virus cDNA or fragments or derivatives thereof are used as a 
vaccine when expressed alone or in combination with one or more 
Norwalk virus proteins. 
15 Individuals are vaccinated orally, parenterally or by a combination 

of both methods. For parenteral vaccination, the expressed protein is 
mixed with an adjuvant and administered in one or more doses in 
amounts and at intervals that give maximum immune response and 
protective immunity. Oral vaccination parallels natural infection by 
20 Norwalk virus inoculum, i.e. the individual ingests the vaccine with 
dechlorinated water or buffer. Oral vaccination may follow sodium 
bicarbonate treatment to neutralize stomach activity. For example, 
sodium bicarbonate solution is taken by each person 2 minutes before and 
5 minutes after vaccine administration. 

25 Example 11 

Production of a vaccine for other agents by using 
expressed Norwalk virus capsids as a carrier or vehicle 
for the expression of other antig ens 
or parts of other antigens 
30 Identification of the region of the genome that encodes the Norwalk 

virus capsid protein and that forms particles following expression (i.e., 
regions 534(5 through (5935 and 5337 through 7753) allows genetic 
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engineering of the cDNA that encodes the capsid pr tein to incorporate 
one or more heterologous pieces f cDNA that encode antigenic epitopes. 
Expression of such recombinant genes produces a recombinant capsid that 
is antigenic, induces antibodies, and protects against Norwalk virus and 

5 its antigens, and against the heterologous epitopes or antigens. 

Alternatively, the Norwalk virus capsid protein carrier is mixed with 
or covalently linked to one or more heterologous protein antigens or 
synthetic peptides containing heterologous epitopes. This mixture is 
antigenic, induces antibodies, and protects against Norwalk virus and its 

10 antigens, and against the heterologous epitopes or antigens. 

Individuals are vaccinated using the oral and parenteral methods 
described above in example 10. 

Example 12 
Kit 

15 Kits for detecting immune responses to Norwalk virus and are 

prepared by supplying in a container a protein deduced from the Norwalk 
virus genome shown in Table 2 or fragments or derivatives thereof. 
Similar proteins are prepared from Norwalk-related viruses to detect 
immune responses to the Norwalk-related viruses. For example, the 

20 protein encoded by Norwalk virus nucleotides 1 through 7753, the protein 
encoded by Norwalk virus nucleotides 146 through 5359, the protein 
encoded by Norwalk virus nucleotides 5337 through 7573, the protein 
encoded by Norwalk virus nucleotides 5346 through 6935, the protein 
encoded by Norwalk virus nucleotides 6938 through 7573 and any 

25 combinations thereof may be used in such kits. The kit can also include 
controls for false positive and false negatives, Teagents and sample 
collection devices. The kit can be equipped to detect one sample or 
multiple samples. 
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Example 13 
Kit 

Kits for detecting Norwalk viruses and Norwalk-related viruses are 
prepared by supplying in a container at least one antiserum made from a 
5 protein expressed from the deduced amino acid sequence of the Norwalk 
virus genome shown in Tables 3, 4, or 5 or from a fragment or derivative 
the deduced amino acid sequence. Similar antiserum are made from 
proteins encoded by Norwalk-related viruse genomes. For example, an 
antiserum made to the protein encoded by Norwalk virus nucleotides 1 

10 through 7753, the protein encoded by Norwalk virus nucleotides 146 
through 5359, the protein encoded by Norwalk virus nucleotides 5337 
through 7573, the protein encoded by Norwalk virus nucleotides 5346 
through 6935, the protein encoded by Norwalk virus nucleotides 6938 
through 7573 and any combination thereof may be used in such kits. The 

15 kit can also include controls for false positives and false negatives, 
reagents and sample collection devices. The kit can be equipped to detect 
one sample or multiple samples. 

In conclusion, it is seen that the present invention and the 
embodiments disclosed herein are well adapted to carry out the objectives 

20 and attain the ends and advantages mentioned as well as other inherent 
therein. The novel features characteristic of this invention are set forth 
in the appended claims. While presently preferred embodiments of the 
invention have been described for the purpose of disclosure, numerous 
changes in the details of synthesis and use described herein will be 

25 apparent to those skilled in the art. It should be understood, however, 
that there is no intention to limit the invention to the specific form 
disclosed, but on the contrary, the intention is to cover all modifications, 
alternative means of synthesis and use and equivalents falling within the 
spirit and scope of the invention. 
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Table 2. The nucleotide sequence of Norwalk virus genome. 



GGCGTCAAAA 


GACGTCGTTC 


CTACTGCTGC 


TAGCAGTGAA 


AATGCTAACA 


ACAATAGTAG 


60 


TATTAAGTCT 


CGTCTATTGG 


CGAGACTCAA 


GGGTTCAGGT 


GGGGCTACGT 


CCCCACCCAA 


120 


CTCGATAAAG 


ATAACCAACC 


AAGATATGGC 


TCTGGGGCTG 


ATTGGACAGG 


TCCCAGCGCC 


180 


AAAGGCCACA 


TCCGTCGATG 


TCCCTAAACA 


ACAGAGGGAT 


AGACCACCAC 


GGACTGTTGC 


240 


CGAAGTTCAA 


CAAAATTTGC 


GTTGGACTGA 


GAGACCACAA 


GACCAGAATG 


TTAAGACGTG 


300 


GGATGAGCTT 


GACCACACAA 


CAAAACAACA 


GATACTTGAT 


GAACACGCTG 


AGTGGTTTGA 


360 


TGOCGGTGGC 


TTAGGTCCAA 


GTACACTACC 


CACTAGTCAT 


GAACGGTACA 


CACATGAGAA 


420 


TGATGAAGGC 


CACCAGGTAA 


AGTGGTCGGC 


TAGGGAAGGT 


GTAGACCTTG 


GCATATCCGG 


480 


GCTCACGACG 


GTGTCTGGGC 


CTGAGTGGAA 


TATGTGCCCG 


CTACCACCAG 


TTGACCAAAG 


540 


6AGCACGACA 


CCTGCAACTG 


AGCCCACAAT 


TGGTGACATG 


ATCGAATTCT 


ATGAAGGGCA 


600 


CATCTATCAT 


TATGCTATAT 


ACATAGGTCA 


AGGCAAGACG 


GTGGGTGTAC 


ACTCCCCTCA 


660 


AGCAGCCTTC 


TCAATAACGA 


GGATCACCAT 


AGAGCCCATA 


TCAGCTTGGT 


GGCGAGTCTG 


720 


TTATGTCCCA 


CAACC&AAAC 


AGAGGCTGAC 


ATACGACCAA 


CTCAAAOAAT 


TAGAAAATGA 


780 


ACCATGGCCG 


TATGCCGCAG 


TCACGAACAA 


CTGCTTCGAA 


TTTTGTTGCC 


AGGTCATGTG 


840 


CTTGGAAGAT 


ACTTGGTTGC 


AAAGGAAGCT 


CATCTCCTCT 


GGCCGGTTTT 


ACCACCCGAC 


900 


CCAAGATTG6 


TCCCGAGACA 


CTCCAGAATT 


CCAACAAGAC 


AGCAAGTTAG 


AGATGGTTAG 


960 


GGATGCAGTG 


CTAGCCGCTA 


TAAATGGGTT 


GGTGTCGCGG 


CCATTTAAAG 


ATCTTCTGGG 


1020 


TAAGCTCAAA 


CCCTTGAACG 


TGCTTAACTT 


ACTTTCAAAC 


TGTGATTGGA 


CGTTCATGGG 


1080 


GGTCGTGGAG 


ATGGTGGTCC 


TCCTTTTAGA 


ACTCTTTGGA 


ATCTTTTGGA 


ACCCACCTGA 


1140 


TGTTTCCAAC 


TTTATAGCTT 


CACTCCTGCC 


AGATTTCCAT 


CTACAGGGCC 


CCGAGGACCT 


1200 


TGCCAGGGAT 


CTCGTGCCAA 


TAGTATTGGG 


GGGGATCGGC 


TTAGCCATAG 


GATTCACCAG 


1260 


AGACAAGGTA 


AGTAAGATGA 


TGAAGAATGC 


TGTTGATGGA 


CTTOGTGOGG 


CAACCCAGCT 


1320 


OGGTCAATAT 


GGCCTAGAAA 


TATTCTCATT 


ACTAAAGAAG 


TACTTCTTCG 


GTGGTGATCA 


1380 


AACAGAGAAA 


ACCCTAAAAG 


ATATTGAGTC 


AGCAGTTATA 


GATATGGAAG 


TACTATCATC 


1440 


TACATCAGTG 


ACTCAGCTCG 


TGAGGGACAA 


ACAGTCTGCA 


CGGGCTTATA 


TGGCCATCTT 


1500 


AGATAATGAA 


GAAGAAAAGG 


CAAGGAAATT 


ATCTGTCAGG 


AATGCCGACC 


CACACGTAGT 


1560 


ATCCTCTACC 


AATGCTCTCA 


TATCCCGGAT 


CTCAATGGCT 


AGGGCTGCAT 


TGGCCAAGGC 


1620 


TCAAGCTGAA 


ATGACCAGCA 


GGATGCGTCC 


TGTGGTCATT 


ATGATGTGTG 


GGCCCCCTGG 


1680 


TATAGGTAAA 


ACCAAGGCAG 


CAGAACATCT 


GGCTAAACGC 


CTAGCCAATG 


AGATACGGCC 


1740 


TGGTGGTAAG 


GTTGGGCTGG 


TCCCACGGGA 


GGGAGTGGAT 


CATTGGGATG 


GATATCACGG 


1800 


AGAGGAAGTG 


ATGCTGTGGG 


ACGACTATGG 


AATGACAAAG 


ATACAGGAAG 


ACTGTAATAA 


1860 


ACTGCAAGCC 


ATAGCCGACT 


CAGCCCCCCT 


AACACTCAAT 


TGTGACCGAA 


TAGAAAACAA 


1920 


GGGAATGCAA 


TTTGTGTCTG 


ATGCTATAGT 


CATCACCACC 


AATGCTCCTG 


GCCCAGCCCC 


1980 


AGTGGACTTT 


GTCAACCTCG 


GGCCTGTTTG 


CCGAAGGGTG 


GACTTCCTTG 


TGTATTGCAC 


2040 


GGCACCTGAA 


GTTGAACACA 


CGAGGAAAGT 


CAGTCCTGGG 


GAGAGAACTG 


GACTGAAAGA 


2100 


CTGCTTCAAG 


CCCGATTTCT 


CACATCTAAA 


AATGGAGTTG 


GCTCCCCAAG 


GGGGCTTTGA 


2160 


TAACCAAGGG 


AATACCCCGT 


TTGGTAAGGG 


TGTGATGAAG 


CCCACCACCA 


TAAACAGGCT 


2220 


GTTAATCCAG 


GCTGTAGCCT 


TGAOGATGGA 


GAGACAGGAT 


GAGTTCCAAC 


TCCAGGGGCC 


2280 


TAOGTATGAC 


TTTGATACTG 


ACAGAGTAGC 


TGCGTTCACG 


AGGATGGCCC 


GAGCCAACGG 


2340 


GTTGGGTCTC 


ATATCCATGG 


CCTCCCTAGG 


CAAAAAGCTA 


CGCAGTGTCA 


CCACTATTGA 


2400 


AGG&TTAA&G 


A&TGOTCT&T 


CAGGCTATAA 


&ATATCAAAA 


TGCAGmTAC 


A&TGGCAGTC 


2460 


&&GGGTGT&C 


ATTATAGA&T 


CM3&TGGTGC 


CAGTGTACAA 


ATCAA&GAAG 


ACAAGCAAGC 


2520 
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Table 2, continued 



TTTGACCCCT 


CTGGAGGAGA 


CAATTAACAC 


GGCCTCACTT 


GCCATCACTC 


GACTCAAAGC 


2580 


AGCTAGGGCT 


GTGGCATACG 


CTTCATGTTT 


CCAGTCCGCC 


ATAACTACCA 


TACTACAAAT 


2640 


GGCGGGATCT 


GOGCTCGTTA 


TTAATCGAGC 


GGTCAAGCGT 


ATGTTTGGTA 


CCCGTACAGC 


2700 


AGCCATGGCA 


TTAGAAGGAC 


CTGGGAAAGA 


AGATAATTGC 


AGGGTCGATA 


AGGCTAAGGA 


2760 


AGCT6GAAAG 


GGGCCCATAG 


GTCATGATGA 


CATGGTAGAA 


AGGTTTGGCC 


TATGTGAAAC 


2820 


TGAAGAGGAG 


GAGAGTGAGG 


ACCAAATTCA 


AATGGTACCA 


AGTGATGCCG 


TCCCAGAAGG 


2880 


AAAGAACAAA 


GGCAAGACCA 


AAAAGGGACG 


TGGTOGCAAA AATAACTATA 


ATGCATTCTC 


2940 


TCGCCGTGGT 


CTGAGTGATG 


AAGAATATGA 


AGAGTACAAA AAGATCAGAG 


AAGAAAAGAA 


3000 


TGGCAATTAT 


AGTATACAAG 


AATACTTGGA 


GGACOGCCAA 


CGATATGAGG 


AAGAATTAGC 


3060 


AGAGGTACAG 


GCAGGTGGTG 


ATGGTGGCAT 


AGGAGAAACT 


GAAATGGAAA 


TCCGTCACAG 


3X20 


GGTCTTCTAT 


AAATCCAAGA 


GTAAGAAACA 


CCAACAAGAG 


CAAOGGCGAC 


AACTTGGTCT 


3180 


AGTGACTGGA 


TCAGACATCA 


GAAAACGTAA 


GCCCATTGAC 


TGGACCCCGC 


CAAAGAATGA 


3240 


ATGGGCAGAT 


GATGACAGAG 


AGGTGGATTA 


TAATGAAAAG ATCAATTTTG 


AAGCTCCCCC 


3300 


GACACTATGG 


AGCCGAGTCA 


CAaAGTTTGG 


ATGAGGATGG 


WW* * A * Vw» 


TCAGCCOGAC 


3360 


AGTGTTCATC 


ACAACCACAC 


ATGTAGTGCC AACTGGTGTG AAAGAATTCT 


TTGGTGAGCC 


3420 


Cf^TATCTAftT 


ATAGCAATCC 


ACCAAGCAGG 


TGAGTTCACA 


CAATTCAGGT 


TCTCAAAGAA 


3480 




GACTTGACAG 


GTATGGTCCT 


TGAAGAAGGT 


TGCCCTGAAG 


GGACAGTCTG 


3540 




ATTAAACfifiG 


ATTCGGGTGA 


ACTACTTCCG 


CTAGCOGTCC 


GTRTGGGGGC 


3600 


XXIX AUV^AUV 


ATCAGGATAC 


AGGGTCGGCT 


TGTCCATGGC 


CAATCAGGGA 




3660 


AGGGGCCAAT 


GCAAAGfiGCA 


TGGATCTTGG 


CACTATACCA GGAGACTGOG 


GGGCACCATA 


3720 


CGTCCACAAG 


CGGGGGAATG 


ACTGGGTTGT 


GTGTGGAGTC 


CAOGCTGCAG 


CCACAAAGTC 


3780 


AGGCAACACC 


GTGGTCTGCG 


CTGTACAGGC 


TGGAGAGGGC 


GAAACCGGAC 


TAGAAGGTGG 


3840 


AGACAAGGGG 


CATTATGCCG 


GCCACGAGAT 


TGTGAGGTAT 


GGAAGTGGCC 


CAGCACTGTC 


3900 


AACTAAAACA 


AAATTCTGGA 


GGTCCTCCCC 


AGAACCACTG 


CCCCCOGGAG 


TATATGAGCC 


3960 


AGGATACCTG 


GGGGGCAAGG 


ACCCCCGTGT 


ACAGAATGGC 


CCATCCCTAC 


AACAGGTACT 


4020 


ACGTGACCAA 


CTGAAACCCT 


TTGCGGACCC 


CCGCGGCCGC 


ATGCCTGAGC 


CTGGCCTACT 


4080 


GGAGGCTGCG 


GTTGAGACTG 


TAACATCCAT 


GTTAGAACAG 


ACAATGGATA 


CCCCAAGCCC 


4140 


GTGGTCTTAC 


GCTGATGCCT 


GCCAATCTCT 


TGACAAAACT 


ACTAGTTCGG 


GGTACCCTCA 


4200 


CCATAAAAGG 


AAGAATGATG 


ATTGGAATGG 


CACCACCTTC 


GTTGGAGAGC 


TCGGTGAGCA 


4260 


AGCTGCACAC 


GCCAACAATA 


TGTATGAGAA 


TGCTAAACAT 


ATGAAACCCA 


TTTACACTGC 


4320 


AGCCTTAAAA 


GATGAACTAG 


TCAAGCCAGA AAAGATTTAT 


CAAAAAGTCA 


AGAAGCGTCT 


4380 


ACTATGGGGC 


GCCGATCTCG 


GAACAGTGGT 


CAGGGCCGCC 


CGGGCTTTTG 


GCCCATTTTG 


4440 


TGACGCTATA 


AAATCACATG 


TCATCAAATT 


GCCAATAAAA 


GTTGGCATGA 


ACACAATA6A 


4500 


AGATGGCCCC 

ftw ft X W wwwv 




CTGAGCATGC 


TAAATATAAG 


AATCATTTTG 


ATttCAGATTA 


4560 


TACAGCATGG 


GACTCAACAC 


AAAATAGACA 


AATTATGACA 


GAATCCTTCT 


CCATTATGTC 


4620 


GCGCCTTACG 


GCCTCACCAG 


AATTGGCCGA 


GGTTGTGGCC 


CAAGATTTGC 


TAGCACCATC 


4680 


TGAGATGGAT 


GTAGGTGATT 


ATGTCATCAG 


GGTCAAAGAG 


GGGCTGCCAT 


CTGGATTCCC 


4740 


ATGTACTTCC 


CAGGTGAACA 


GCATAAATCA 


CTGGATAATT 


ACTCTCTGTG 


CACTGTCTGA 


4800 


GGCCACTGGT 


TTATCACCTG 


ATGTGGTGCA ATCCATGTCA 


TATTTCTCAT 


TTTATGGTGA 


4860 


TGATGAGATT 


GTGTCAACTG 


ACATAGATTT 


TGACCCAGCC 


OGCCTCACTC 


AAATTCTCAA 


4920 


GGAATATGGC 


CTCAAACCAA 


CAAGGCCTGA 


CAAAACAGAA 


GGACCAATAC 


AAGTGAGGAA 


4980 


&&&TGTGG&T 


GGACTGGTCT 


TCKGCGGCG 


C^CCATTTCC 


CGTGATGCGG 


C&GGGTTCCA 


5040 


&GGC&GGTX& 


GMAGGGCCT 


OGATTGAACG 


CCAAATCTTC 


TCGACCCGCG 


GGCCCA&TG& 


5100 
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Table 2, continued 












TTCAGATCCA 


TCAGAGACTC 


TAGTGCCACA 


CACTCAAAGA 


AAAATACAGT 


TGATTTCACT 


5160 


TCTAGGGGAA 


GCTTCACTCC 


ATGGTGAGAA ATTTTACAGA 


AAGATTTCCA GCAAGGTCAT 


5220 


ACATGAAATC 


AAGACTGGTG 


GATTGGAAAT 


GTATGTCCCA 


GGATGGCAGG 


CCATGTTCCG" 


5280 


CTGGATGCGC 


TTCCATGACC 


TCGGATTGTG 


GACAGGAGAT 


CGCGATCTTC 


TGCCCGAATT 


5340 


CGTAAATGAT 


GATGGCGTCT 


AAGGACGCTA 


CATCAAGCGT 


GGATGGCGCT 


AGTGGCGCTG 


5400 


GTCAGTTGGT 


ACCGGAGGTT 


AATGCTTCTG 


ACCCTCTTGC 


AATGGATCCT 


GTAGCAGGTT 


5460 


CTTCGACAGC 


AGTGGCGACT 


GCTGGACAAG 


TTAATCCTAT 


TGATCCCTGG 


ATAATTAATA 


5520 


AXTTTGTGCA 


AGCCCCCCAA 


GGTGAATTTA 


CTATTTCCCC 


AAATAATACC 


CCCGGTGATG 


5580 


TTTTGTTTGA 


TTTGAGTTTG 


GGTCCCCATC 


TTAATCCTTT 


CTTGCTCCAT 


CTATCACAAA 


5640 


TGTATAATGG 


TTGGGTTGGT 


AACATGAGAG 


TCAGGATTAT 


GCTAGCTGGT 


AATGCCTTTA 


5700 


CTGOGGGGAA 


GATAATAGTT 


TCCTGCATAC 


CCCCTGGTTT 


TGGTTCACAT 


AATCTTACTA 


5760 


TAGCACAAGC 


AACTCTCTTT 


CCACATGTGA 


TTGCTGATGT 


TAGGACTCTA 


GACCCCATTG 


5820 


AGGTGCCTTT 


GGAAGATGTT 


AGGAATGTTC 


TCTTTCATAA 


TAATGATAGA 


AATCAACAAA 


5880 


CCATGCGCCT 


TGTGTGCATG 


CTGTACACCC 


CCCTGGGGAG 


TGGTGGTOGT 


ACTGOTGATT 


5940 


CTTTTGTAGT 


TGCAGGGCGA 


GTTATGACTT 


GCCCCAGTCC 


TGATTTTAAT 


TTCTTGTTTT 


6000 


TAGTCCCTCC 


TACGGTGGAG 


CAGAAAACCA 


GGCCCTTCAC 


ACTCCCAAAT 


CTGGCATTGA 


6060 


GTTCTCTGTC 


TAACTCAOGT 


GCCCCTCTCC 


CAATCAGTAG 


TATGGGCATT 


TCCCCAGACA 


6120 


ATGTCCAGAG 


TGTGCAGTTC 


CAAAATGGTC 


GGTGTACTCT 


GGATGGCCGC 


CTGGTTGGCA 


6180 


CCACCCCAGT 


TTCATTGTCA 


CATGTTGCCA 


AGATAAGAGG 


GACCTCCAAT 


GGCACTGTAA 


6240 


TCAACCTTAC 


TGAATTGGAT 


GGCACACCCT 


TTCACCCTTT 


TGAGGGCCCT 


GCCCCCATTG 


6300 


GGTTTCCAGA 


CCTOGGTGGT 


TGTGATTGGC 


ATATCAATAT 


GACACAGTTT 


G6CCATTCTA 


6360 


GCCAGACCCA 


GTATGATGTA 


GACACCACCC 


CTGACACTTT 


TGTCCCCCAT 


C7TGGTTCAA 


6420 


TTCAGGCAAA 


TGGCATTGGC 


AGTGGTAATT 


ATGTTGGTGT 


TCTTAGCTGG 


ATTTCCCCCC 


6480 


CATCACACCC 


GTCTGGCTCC 


CAAGTTGACC 


TTTGGAAGAT 


CCCCAATTAT 


GGGTGAAGTA 


6540 


TTACGGAGGC 


AACACATCTA 


GCCCCTTCTG 


TATACCCCCC 


TGGTTTCGGA 


GAGGTATTGG 


6600 


TCTTTTTCAT 


GTCAAAAATG 


CGAGGTCCTG 


GTGCTTATAA 


TTTGCCCTGT 


CTATTACCAC 


6660 


AAGAGTACAT 


TTCACATCTT 


GCTAGTGAAC 


AAGCCCCTAC 


TGTAGGTGAG 


GCTGCCCTGC 


6720 


TCCACTATGT 


TGACCCTGAT 


ACCGGTCGGA 


ATCTTGGGGA 


ATTCAAAGCA 


TACCCTGATG 


6780 


GTTTCCTCAC 


TTGTGTCCCC 


AATGGGGCTA 


GCTCGGGTCC 


AGAAGAGCTG 


CCGATCAATG 


6840 


GGGTCTTTGT 


CTTTGTTTCA 


TGGGTGTCCA 


GATTTTATCA 


ATTAAAGCCT 


GTGGGAACTG 


6900 


CCAGCTCGGC 


AAGAGGTAGG 


CTTGGTCTGC 


GCCGATAATG 


GCCCAAGCCA 


TAATTGGTGC 


6960 


AATTGCTGCT 


TCCACAGCAG 


GTAGTGCTCT 


GGGAGGGGGC 


ATACAGGTTG 


GTGGCGAAGC 


7020 


6GCCCTCCAA 


AGCCAAAGGT 


ATCAAGAAAA 


TTTGGAACTG 


CAAGAAAATT 


CTTTTAAACA 


7080 


TGACAGGGAA 


ATGATTGGGT 


ATCAGGTTGA 


AGCTTCAAAT 


GAATTATTGG 


CTAAAAATTT 


7140 


GGCAACTAGA 


TATTCACTCC 


TCCGTGCTGG 


GGGTTTGACC 


AGTGCTGATG 


CAGCAAGATC 


7200 


TGTGGCAGGA 


GCTCGAGTCA 


CCCGCATTGT 


AGATTGGAAT 


GGCGTGAGAG 


TGTCTGCTCC 


7260 


CGAGTCCTCT 


GCTACCACAT 


TGAGATCCGG 


TGGCTTCATG 


TGAGTTCCCA 


TACCATTTGC 


7320 


CTCTAAGCAA 


AAACAGGTTC 


AATCATCTGG 


TATTAGTAAT 


CCAAATTATT 


CCCCTTCATC 


7380 


CATTTCTCGA 


ACCACTAGTT 


GGGTCGAGTC 


ACAAAACTCA 


TCGAGATTTG 


GAAATCTTTC 


7440 


TCCATACCAC 


GCGGAGGCTC 


TCAATACAGT 


GTGGTTGACT 


CCACCCGGTT 


CAACAGCCTC 


7500 


TTCTACACTG 


TCTTCTGTGC 


CACGTGGTTA 


TTTCAATACA 


GACAGGTTGC 


CATTATTCGC 


7560 




CGMGMGTT 


GT&&TATG&A 


&TGTGGGC&T 


CMOTTC&TT 


TAATTAGGTT 


7620 






OT&AA&AA&A 


AAAAAAA&AA 






7680 
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Tabl 2, continued 

AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 7740 

7753 

AAAAAAAAAA AAA 
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Table 3. The amino acid sequence deduced from nucXeotid b 146 through 
5359 of the Norwalk virus genome shown in Table 2. 

CTCGATAAAG ATAACCAACC AAGAT ATG GCT CTG GGG CTG ATT GGA CAG GTC 172 

Net Ala Leu Gly Leu He Gly Gin Val 
1 5 

CCA GCG CCA AAG GCC ACA TCC GTC GAT GTC CCT AAA CAA CAG AGG GAT 220 
Pro Ala Pro Lys Ala Thr Ser Val Asp Val Pro Lys Gin Gin Arg Asp 
10 15 20 25 

AGA CCA CCA GGG ACT GTT GCC GAA GTT CAA CAA AAT TTG CGT TGG ACT 268 
Arg Pro Pro Arg Thr Val Ala Glu Val Gin Gin Asn Leu Arg Trp Thr 
30 35 40 

GAG AGA CCA CAA GAC CAG AAT GTT AAG ACG TGG GAT GAG CTT GAC CAC 316 
Glu Arg Pro Gin ABp Gin Asn Val Lys Thr Trp Asp Glu Leu Asp His 
45 50 55 

ACA ACA AAA CAA CAG ATA CTT GAT GAA CAC GCT GAG TGG TTT GAT GCC 364 

Thr Thr Lys Gin Gin lie Leu Asp Glu His Ala Glu Trp Phe Asp Ala 
60 65 70 

GGT GGC TTA GGT CCA AGT ACA CTA CCC ACT AGT CAT GAA CGG TAC ACA 412 
Gly Gly Leu Gly Pro Ser Thr Leu Pro Thr Ser His Glu Arg Tyr Thr 
75 80 85 

CAT GAG AAT GAT GAA GGC CAC CAG GTA AAG TGG TCG GCT AGG GAA GGT 460 
His Glu Asn Asp Glu Gly His Gin Val Lys Trp Ser Ala Arg Glu Gly 
90 95 100 105 

GTA GAC CTT GGC ATA TCC GGG CTC ACG ACG GTG TCT GGG CCT GAG TGG 508 
Val Asp Leu Gly He Ser Gly Leu Thr Thr Val Ser Gly Pro Glu Trp 
110 115 120 

AAT ATG TGC CCG CTA CCA CCA GTT GAC CAA AGG AGC ACG ACA CCT GCA 556 
Asn Met Cys Pro Leu Pro Pro Val Asp Gin Arg Ser Thr Thr Pro Ala 
125 130 135 

ACT GAG CCC ACA ATT GGT GAC ATG ATC GAA TTC TAT GAA GGG CAC ATC 604 
Thr Glu Pro Thr lie Gly Asp Met He Glu Phe Tyr Glu Gly His He 
140 145 150 

TAT CAT TAT GCT ATA TAC ATA GGT CAA GGC AAG ACG GTG GGT GTA CAC 652 
Tyr His Tyr Ala He Tyr He Gly Gin Gly Lys Thr Val Gly Val His 
155 160 165 

TCC CCT CAA GCA GCC TTC TCA ATA ACG AGG ATC ACC ATA CAG CCC ATA 700 
Ser Pro Gin Ala Ala Phe Ser He Thr Arg He Thr He Gin Pro He 
170 175 180 185 

TCA GCT TGG TGG CGA GTC TGT TAT GTC CCA CAA CCA AAA CAG AGG CTC 748 
Ser Ala Trp Trp Arg Val Cys Tyr Val Pro Gin Pro Lys Gin Arg Leu 
190 195 200 

ACA TAC GAC CAA CTC AAA GAA TTA GAA AAT GAA CCA TGG CCG TAT GCC 796 
Thr Tyr Asp Gin Leu LyB Glu Leu Glu Asn Glu Pro Trp Pro Tyr Ala 
205 210 215 

GCA GTC ACG AAC AAC TGC TTC GAA TTT TGT TGC CAG GTC ATG TGC TTG 844 
Ala Val Thr Asn Asn Cys Phe Glu Phe Cys Cys Gin Val Met Cys Leu 
220 225 230 
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Table 3, continued 

GAA GAT ACT TGG TTG CAA AGG AAG CTC ATC TCC TCT GGC CGG TTT TAC 892 
Glu Asp Thr Trp Leu Gin Arg Lys Leu He Ser Ser Gly Arg Phe Tyr 
235 240 245 

CAC CCG ACC CAA GAT TGG TCC CGA GAC ACT CCA GAA TTC CAA CAA GAC 940 
Hie Pro Thr Gin Asp Trp Ser Arg Asp Thr Pro Glu Phe Gin Gin Asp 
250 255 280 265 

AGC AAG TTA GAG ATG GTT AGG GAT GCA GTG CTA GCC GCT ATA AAT GGG 988 
Ser Lys Leu Glu Met Val Arg Asp Ala Val Leu Ala Ala Xle Asn Gly 
270 275 280 

TTG GTG TOG CGG CCA TTT AAA GAT CTT CTG GGT AAG CTC AAA CCC TTG 1036 
Leu Val Ser Arg Pro Phe Lys Asp Leu Leu Gly Lys Leu Lys Pro Leu 
285 290 295 

AAC GTG CTT AAC TTA CTT TCA AAC TGT GAT TGG ACG TTC ATG GGG GTC 1084 
Asn Val Leu Asn Leu Leu Ser Asn Cys Asp Trp Thr Phe Met Gly Val 

300 305 310 

GTG GAG ATG GTG GTC CTC CTT TTA GAA CTC TTT GGA ATC TTT TGG AAC 1132 
Val Glu Met Val Val Leu Leu Leu Glu Leu Phe Gly Xle Phe Trp Asn 
315 320 325 

CCA CCT GAT GTT TCC AAC TTT ATA GCT TCA CTC CTG CCA GAT TTC CAT 1180 
Pro Pro Abp Val Ser Asn Phe He Ala Ser Leu Leu Pro Asp Phe His 
330 335 340 345 

CTA GAG GGC CCC GAG GAC CTT GCC AGG GAT CTC GTG CCA ATA GTA TTG 1228 
Leu Gin Gly Pro Glu Asp Leu Ala Arg Asp Leu Val Pro He Val Leu 
350 355 360 

GGG GGG ATC GGC TTA GCC ATA GGA TTC ACC AGA GAC AAG GTA AGT AAG 1276 
Gly Gly He Gly Leu Ala He Gly Phe Thr Arg Asp Lys Val Ser Lys 
365 370 375 

ATG ATG AAG AAT GCT GTT GAT GGA CTT CGT GCG GCA ACC CAG CTC GGT 1324 
Met Met Lys Asn Ala Val Asp Gly Leu Arg Ala Ala Thr Gin Leu Gly 
380 385 390 

CAA TAT GGC CTA GAA ATA TTC TCA TTA CTA AAG AAG TAC TTC TTC GGT 1372 
Gin Tyr Gly Leu Glu lie Phe Ser Leu Leu Lys Lys Tyr Phe Phe Gly 
395 400 405 

GGT GAT CAA ACA GAG AAA ACC CTA AAA GAT ATT GAG TCA GCA GTT ATA 1420 
Gly Asp Gin Thr Glu Lys Thr Leu Lys Asp lie Glu Ser Ala Val He 
410 415 420 425 

GAT ATG GAA GTA CTA TCA TCT ACA TCA GTG ACT CAG CTC GTG AGG GAC 1468 
Asp Met Glu Val Leu Ser Ser Thr Ser Val Thr Gin Leu Val Arg Asp 
430 435 440 

AAA CAG TCT GCA CGG GCT TAT ATG GCC ATC TTA GAT AAT GAA GAA GAA 1516 
Lys Gin Ser Ala Arg Ala Tyr Met Ala He Leu Asp Asn Glu Glu Glu 
445 450 455 

AAG GCA AGG AAA TTA TCT GTC AGG AAT GCC GAC CCA CAC GTA GTA TCC 1564 
Lys Ala Arg Lys Leu Ser Val Arg Asn Ala Asp Pro HIb Val Val Ser 
460 465 470 



TCT ACC AAT GCT CTC ATA TCC CGG ATC TCA ATG GCT AGG GCT GCA TTG 
Ser Thr Asn Ala Leu He Ser Arg Xle Ser Met Ala Arg Ala Ala Leu 
475 480 485 



1612 
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Table 3, continued 

GCC AAG GCT CAA GCT GAA ATG ACC AGC AGG ATG CGT CCT GTG GTC ATT 1660 
Ala Lys Ala Gin Ala Glu Met Thr Ser Arg Met Arg Pro Val Val He 
490 495 500 505 

ATG ATG TGT GGG CCC CCT GGT ATA GGT AAA ACC AAG GCA GCA GAA CAT 1708 
Met Met Cye Gly Pro Pro Gly He Gly Lys Thr Lye Ala Ala Glu His 
510 515 520 

CTG GCT AAA CGC CTA GCC AAT GAG ATA CGG CCT GGT GGT AAG GTT GGG 1756 
Leu Ala Lye Arg Leu Ala Asn Glu He Arg Pro Gly Gly Lys Val Gly 
525 530 535 

CTG GTC CCA CGG GAG GCA GTG GAT CAT TGG GAT GGA TAT CAC GGA GAG 1804 
Leu Val Pro Arg Glu Ala Val Asp His Trp Asp Gly Tyr His Gly Glu 
540 545 550 

GAA GTG ATG CTG TGG GAC GAC TAT GGA ATG ACA AAG ATA CAG GAA GAC 1852 
Glu Val Met Leu Trp Asp Asp Tyr Gly Met Thr Lys He Gin Glu Asp 
555 560 565 

TGT AAT AAA CTG CAA GCC ATA GCC GAC TCA GCC CCC CTA ACA CTC AAT 1900 
Cye Asn Lys Leu Gin Ala He Ala Asp Ser Ala Pro Leu Thr Leu Asn 
570 575 580 585 

TGT GAC CGA ATA GAA AAC AAG GGA ATG CAA TTT GTG TCT GAT GCT ATA 1948 
Cye Asp Arg He Glu Asn Lys Gly Met Gin Phe Val Ser Asp Ala He 
590 595 600 

GTC ATC ACC ACC AAT GCT CCT GGC CCA GCC CCA GTG GAC TTT GTC AAC 1996 
Val He Thr Thr Asn Ala Pro Gly Pro Ala Pro Val Asp Phe Val Asn 
605 610 615 

CTC GGG CCT GTT TGC CGA AGG GTG GAC TTC CTT GTG TAT TGC ACG GCA 2044 
Leu Gly Pro Val Cys Arg Arg Val Asp Phe Leu Val Tyr Cys Thr Ala 
620 625 630 

CCT GAA GTT GAA CAC ACG AGG AAA GTC AGT CCT GGG GAC ACA ACT GCA 2092 
Pro Glu Val Glu His Thr Arg Lys Val Ser Pro Gly Asp Thr Thr Ala 
635 640 645 

CTG AAA GAC TGC TTC AAG CCC GAT TTC TCA CAT CTA AAA ATG GAG TTG 2140 
Leu Lys Asp Cys Phe Lys Pro Asp Phe Ser His Leu Lys Met Glu Leu 
650 655 660 665 

GCT CCC CAA GGG GGC TTT GAT AAC CAA GGG AAT ACC CCG TTT GGT AAG 2188 
Ala Pro Gin Gly Gly Phe Asp Asn Gin Gly Asn Thr Pro Phe Gly Lys 
670 675 680 

GGT GTG ATG AAG CCC ACC ACC ATA AAC AGG CTG TTA ATC CAG GCT GTA 2236 
Gly Val Met Lys Pro Thr Thr He Asn Arg Leu Leu He Gin Ala Val 
685 690 695 

GCC TTG ACG ATG GAG AGA CAG GAT GAG TTC CAA CTC CAG GGG CCT ACG 2284 
Ala Leu Thr Met Glu Arg Gin Asp Glu Phe Gin Leu Gin Gly Pro Thr 
700 705 710 

TAT GAC TTT GAT ACT GAC AGA GTA GCT GCG TTC ACG AGG ATG GCC CGA 2332 
Tyr Asp Phe Asp Thr Asp Arg Val Ala Ala Phe Thr Arg Met Ala Arg 
715 720 725 

GCC AAC GGG TTG GGT CTC ATA TCC ATG GCC TCC CTA GGC AAA AAG CTA 2380 
Ma &sn Gly Leu Gly Leu He Ser met Ala Ser Leu Gly Lys Lys Leu 
730 735 7.40 745 



WO 94/05700 



PCT/US93/08447 



47 



Table 3, continued 

CGC AGT GTC ACC ACT ATT GAA GGA TTA AAG AAT GCT CTA TCA GGC TAT 2428 
Arg Ser Val Thr Thr lie Glu Gly Leu Lys Asn Ala Leu Ser Gly Tyr 
750 755 760 

AAA ATA TCA AAA TGC AGT ATA CAA TGG CAG TCA AGG GTG TAC ATT ATA 2476 
Lya lie Ser Lys Cya Ser He Gin Trp Gin Ser Arg Val Tyr He He 
765 770 775 

GAA TCA GAT GGT GCC AGT GTA CAA ATC AAA GAA GAC AAG CAA GCT TTG 2524 
Glu Ser Asp Gly Ala Ser Val Gin lie Lys Glu Asp Lys Gin Ala Leu 
780 785 790 

ACC CCT CTG CAG CAG ACA ATT AAC ACG GCC TCA CTT GCC ATC ACT CGA 2572 
Thr Pro Leu Gin Gin Thr He Aan Thr Ala Ser Leu Ala He Thr Arg 
795 800 805 

CTC AAA GCA GCT AGG GCT GTG GCA TAC GCT TCA TGT TTC CAG TCC GCC 2620 
Leu Lys Ala Ala Arg Ala Val Ala Tyr Ala Ser Cys Phe Gin Ser Ala 
810 815 820 825 

ATA ACT ACC ATA CTA CAA ATG GCG GGA TCT GCG CTC GTT ATT AAT CGA 2668 
He Thr Thr He Leu Gin Met Ala Gly Ser Ala Leu Val He Asn Arg 
830 835 840 

GOG GTC AAG CGT ATG TTT GGT ACC CGT ACA GCA GCC ATG GCA TTA GAA 2716 
Ala Val Lya Arg Met Phe Gly Thr Arg Thr Ala Ala Met Ala Leu Glu 
845 850 855 

GGA CCT GGG AAA GAA CAT AAT TGC AGG GTC CAT AAG GCT AAG GAA GCT 2764 
Gly Pro Gly Lys Glu His Asn Cys Arg Val His Lys Ala Lys Glu Ala 
860 665 870 

GGA AAG GGG CCC ATA GGT CAT GAT GAC ATG GTA GAA AGG TTT GGC CTA 2812 
Gly Lys Gly Pro He Gly His Asp Asp Met Val Glu Arg Phe Gly Leu 
875 880 885 

TGT GAA ACT GAA GAG GAG GAG AGT GAG GAC CAA ATT CAA ATG GTA CCA 2860 
Cys Glu Thr Glu Glu Glu Glu Ser Glu Asp Gin He Gin Met Val Pro 
890 895 900 905 

AGT GAT GCC GTC CCA GAA GGA AAG AAC AAA GGC AAG ACC AAA AAG GGA 2908 
Ser Asp Ala Val Pro Glu Gly Lys Asn Lys Gly Lys Thr Lys Lys Gly 
910 915 920 

CGT GGT CGC AAA AAT AAC TAT AAT GCA TTC TCT CGC CGT GGT CTG AGT 2956 
Arg Gly Arg Lys Asn Asn Tyr Asn Ala Phe Ser Arg Arg Gly Leu Ser 
925 930 935 

GAT GAA GAA TAT GAA GAG TAC AAA AAG ATC AGA GAA GAA AAG AAT GGC 3004 
Asp Glu Glu Tyr Glu Glu Tyr Lys Lys He Arg Glu Glu Lys Asn Gly 
940 945 950 

AAT TAT AGT ATA CAA GAA TAC TTG GAG GAC CGC CAA CGA TAT GAG GAA 3052 
Asn Tyr Ser He Gin Glu Tyr Leu Glu Asp Arg Gin Arg Tyr Glu Glu 
955 960 965 

GAA TTA GCA GAG GTA CAG GCA GGT GGT GAT GGT GGC ATA GGA GAA ACT 3100 
Glu Leu Ala Glu Val Gin Ala Gly Gly Asp Gly Gly He Gly Glu Thr 
970 . 975 980 985 

GAA ATG GAA ATC CGT CAC AGG GTC TTC TAT AAA TCC AAG AGT AAG AAA 3148 
Glu Het Glu He Arg His Arg Val Phe Tyr Lys Ser Lys Ser Lys Lys 
990 995 1000 
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Table 3, continued 

CAC CAA CAA GAG CAA CGG CGA CAA CTT GGT CTA GTG ACT GGA TCA GAC 
His Gin Gin Glu Gin Arg Arg Gin Leu Gly Leu Val Thr Gly Ser Asp 
1005 1010 1015 



3196 



ATC AGA AAA CGT AAG CCC ATT GAC TGG ACC CCG CCA AAG AAT GAA TGG 
lie Arg Lys Arg Lys Pro lie Asp Trp Thr Pro Pro Lye Asn Glu Trp 
1020 1025 1030 



3244 



GCA GAT GAT GAC AGA GAG GTG GAT TAT AAT GAA AAG ATC AAT TTT GAA 
Ala Asp Asp Asp Arg Glu Val Asp Tyr Asn Glu Lys lie Asn Phe Glu 
1035 1040 1045 



3292 



GCT CCC CCG ACA CTA TGG AGC CGA GTC ACA AAG TTT GGA TCA GGA TGG 
Ala Pro Pro Thr Leu Trp Ser Arg Val Thr Lys Phe Gly Ser Gly Trp 
1050 1055 1060 1065 



3340 



GGC TTT TGG GTC AGC CCG ACA GTG TTC ATC ACA ACC ACA CAT GTA GTG 
Gly Phe Trp Val Ser Pro Thr Val Phe He Thr Thr Thr His Val Val 
1070 1075 1080 



3388 



CCA ACT GGT GTG AAA GAA TTC TTT GGT GAG CCC CTA TCT AGT ATA GCA 
Pro Thr Gly Val Lys Glu Phe Phe Gly Glu Pro Leu Ser Ser He Ala 
1085 1090 1095 



3436 



ATC CAC CAA GCA GGT GAG TTC ACA CAA TTC AGG TTC TCA AAG AAA ATG 
He His Gin Ala Gly Glu Phe Thr Gin Phe Arg Phe Ser Lys Lys Met 
1100 1105 1110 



3484 



CGC CCT GAC TTG ACA GGT ATG GTC CTT GAA GAA GGT TGC CCT GAA GGG 
Arg Pro Asp Leu Thr Gly Met Val Leu Glu Glu Gly Cys Pro Glu Gly 
1115 1120 1125 



3532 



ACA GTC TGC TCA GTC CTA ATT AAA CGG GAT TCG GGT GAA CTA CTT CCG 
Thr Val Cys Ser Val Leu He Lys Arg Asp Ser Gly Glu Leu Leu Pro 
1130 1135 1140 1145 



3580 



CTA GCC GTC CGT ATG GGG GCT ATT GCC TCC ATG AGG ATA CAG GGT CGG 
Leu Ala Val Arg Met Gly Ala He Ala Ser Met Arg He Gin Gly Arg 
1150 1155 1160 



3628 



CTT GTC CAT GGC CAA TCA GGG ATG TTA CTG ACA GGG GCC AAT GCA AAG 
Leu Val His Gly Gin Ser Gly Met Leu Leu Thr Gly Ala Asn Ala Lys 
1165 1170 1175 



3676 



GGG ATG GAT CTT GGC ACT ATA CCA GGA GAC TGC GGG GCA CCA TAC GTC 
Gly Met Asp Leu Gly Thr He Pro Gly Asp Cys Gly Ala Pro Tyr Val 
1180 1185 1190 



3724 



CAC AAG CGC GGG AAT GAC TGG GTT GTG TGT GGA GTC CAC GCT GCA GCC 
His Lys Arg Gly Asn Asp Trp Val Val Cys Gly Val His Ala Ala Ala 
1195 1200 1205 



3772 



ACA AAG TCA GGC AAC ACC GTG GTC TGC GCT GTA CAG GCT GGA GAG GGC 
Thr Lys Ser Gly Asn Thr Val Val Cys Ala Val Gin Ala Gly Glu Gly 
1210 1215 1220 1225 



3820 



GAA ACC GCA CTA GAA GGT GGA GAC AAG GGG CAT TAT GCC GGC CAC GAG 
Glu Thr Ala Leu Glu Gly Gly Asp Lys Gly His Tyr Ala Gly His Glu 
1230 1235 1240 



3868 



foTT GTG &GG TAT GGA AGT GGC CCA GCA CTG TCA ACT AAA ACA AAA TTC 
He Val Arg Tyr Gly Ser Gly Pro Ala Leu Ser Thr Lys Thr Lys Phe 
1245 1250 1255 



3916 
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Table 3, continued 

TGG AGG TCC TCC CCA GAA CCA CTG CCC CCC GGA GTA TAT GAG CCA GCA 
Trp Arg Ser Ser Pro Glu Pro Leu Pro Pro Gly Val Tyr Glu Pro Ala 
1260 1265 1270 



3964 



TAC CTG GGG GGC AAG GAC CCC CGT GTA GAG AAT GGC CCA TCC CTA GAA 
Tyr Leu Gly Gly Lys Asp Pro Arg Val Gin Asn Gly Pro Ser Leu Gin 
1275 1280 1285 



4012 



GAG GTA CTA CGT GAC CAA CTG AAA CCC TTT GCG GAC CCC OGC GGC CGC 4060 
Gin Val Leu Arg Asp Gin Leu Lys Pro Phe Ala Asp Pro Arg Gly Arg 
1290 1295 1300 1305 

ATG CCT GAG CCT GGC CTA CTG GAG GCT GCG GTT GAG ACT GTA ACA TCC 4108 
Met Pro Glu Pro Gly Leu Leu Glu Ala Ala Val Glu Thr Val Thr Ser 
1310 1315 1320 



ATG TTA GAA CAG ACA ATG GAT ACC CCA AGC CCG TGG TCT TAC GCT GAT 
Met Leu Glu Gin Thr Met Asp Thr Pro Ser Pro Trp Ser Tyr Ala Asp 
1325 1330 1335 



4156 



GCC TGC CAA TCT CTT GAC AAA ACT ACT AGT TCG GGG TAC CCT CAC CAT 
Ala Cys Gin Ser Leu Asp Lys Thr Thr Ser Ser Gly Tyr Pro His His 
1340 1345 1350 



4204 



AAA AGG AAG AAT GAT GAT TGG AAT GGC ACC ACC TTC GTT GGA GAG CTC 
Lys Arg Lys Asn Asp Asp Trp Asn Gly Thr Thr Phe Val Gly Glu Leu 
1355 1360 1365 



4252 



GGT GAG CAA GCT GCA CAC GCC AAC AAT ATG TAT GAG AAT GCT AAA CAT 
Gly Glu Gin Ala Ala His Ala ABn Asn Met Tyr Glu Asn Ala Lys His 
1370 1375 1380 1385 



4300 



ATG AAA CCC ATT TAC ACT GCA GCC TTA AAA GAT GAA CTA GTC AAG CCA 
Met Lys Pro lie Tyr Thr Ala Ala Leu Lys Asp Glu Leu Val Lys Pro 
1390 1395 1400 



4348 



GAA AAG ATT TAT CAA AAA GTC AAG AAG CGT CTA CTA TGG GGC GCC GAT 
Glu Lys He Tyr Gin Lys Val Lys Lys Arg Leu Leu Trp Gly Ala Asp 
1405 1410 1415 



4396 



CTC GGA ACA GTG GTC AGG GCC GCC CGG GCT TTT GGC CCA TTT TGT GAC 
Leu Gly Thr Val Val Arg Ala Ala Arg Ala Phe Gly Pro Phe Cys Asp 
1420 1425 1430 



4444 



GCT ATA AAA TCA CAT GTC ATC AAA TTG CCA ATA AAA GTT GGC ATG AAC 
Ala He Lys Ser His Val He Lys Leu Pro He Lys Val Gly Met Asn 
1435 1440 1445 



4492 



ACA ATA GAA GAT GGC CCC CTC ATC TAT GCT GAG CAT GCT AAA TAT AAG 
Thr He Glu Asp Gly Pro Leu He Tyr Ala Glu His Ala Lys Tyr Lys 
1450 1455 1460 1465 



4540 



AAT CAT TTT GAT GCA GAT TAT ACA GCA TGG GAC TCA ACA CAA AAT AGA 
Asn His Phe Asp Ala Asp Tyr Thr Ala Trp Asp Ser Thr Gin Asn Arg 
1470 1475 1480 



4588 



CAA ATT ATG ACA GAA TCC TTC TCC ATT ATG TCG CGC CTT ACG GCC TCA 
Gin He Met Thr Glu Ser Phe Ser He Met Ser Arg Leu Thr Ala Ser 
1485 1490 1495 



4636 



CCA GAA TTG GCC GAG GTT GTG GCC CAA GAT TTG CTA GCA CCA TCT GAG 
Pro Glu Leu Ala Glu Val Val Ala Gin Asp Leu Leu Ala Pro Ser Glu 
1500 1505 . 1510 



4684 
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Table 3, continued 

ATG GAT GTA GGT GAT TAT GTC ATC AGG GTC AAA GAG GGG CTG CCA TCT 4732 
Met Asp Val Gly Asp Tyr Val lie Arg Val Lys Glu Gly Leu Pro Ser 
1515 1520 1525 

GGA TTC CCA TGT ACT TCC CAG GTG AAC AGC ATA AAT CAC TGG ATA ATT 4780 
Gly Phe Pro Cys Thr Ser Gin Val Aan Ser He Aan His Trp He He 
1530 1535 1540 1545 

ACT CTC TGT GCA CTG TCT GAG GCC ACT GGT TTA TCA CCT GAT GTG GTG 4828 
Thr Leu Cys Ala Leu Ser Glu Ala Thr Gly Leu Ser Pro Asp Val Val 
1550 1555 1560 

CAA TCC ATG TCA TAT TTC TCA TTT TAT GGT GAT GAT GAG ATT GTG TCA 4876 
Gin Ser Met Ser Tyr Phe Ser Phe Tyr Gly Asp Asp Glu He Val Ser 
1565 1570 1575 

ACT GAC ATA GAT TTT GAC CCA GCC CGC CTC ACT CAA ATT CTC AAG GAA 4924 
Thr Asp He Asp Phe Asp Pro Ala Arg Leu Thr Gin He Leu Lys Glu 
1580 1585 1590 

TAT GGC CTC AAA CCA ACA AGG CCT GAC AAA ACA GAA GGA CCA ATA CAA 4972 
Tyr Gly Leu Lys Pro Thr Arg Pro Asp Lys Thr Glu Gly Pro He Gin 
1595 1600 1605 

GTG AGG AAA AAT GTG GAT GGA CTG GTC TTC TTG CGG CGC ACC ATT TCC 5020 
Val Arg Lys Asn Val Asp Gly Leu Val Phe Leu Arg Arg Thr He Ser 
1610 1615 1620 1625 

CGT GAT GCG GCA GGG TTC CAA GGC AGG TTA GAT AGG GCT TCG ATT GAA 5068 
Arg Asp Ala Ala Gly Phe Gin Gly Arg Leu Asp Arg Ala Ser He Glu 
1630 1635 1640 

CGC CAA ATC TTC TGG ACC CGC GGG CCC AAT CAT TCA GAT CCA TCA GAG 5116 
Arg Gin He Phe Trp Thr Arg Gly Pro Asn His Ser Asp Pro Ser Glu 
1645 1650 1655 

ACT CTA GTG CCA CAC ACT CAA AGA AAA ATA CAG TTG ATT TCA CTT CTA 5164 
Thr Leu Val Pro His Thr Gin Arg Lys He Gin Leu He Ser Leu Leu 
1660 1665 1670 

GGG GAA GCT TCA CTC CAT GGT GAG AAA TTT TAC AGA AAG ATT TCC AGC 5212 
Gly Glu Ala Ser Leu His Gly Glu Lys Phe Tyr Arg Lys He Ser Ser 
1675 1680 1685 

AAG GTC ATA CAT GAA ATC AAG ACT GGT GGA TTG GAA ATG TAT GTC CCA 5260 
Lys Val He His Glu He Lys Thr Gly Gly Leu Glu Met Tyr Val Pro 
1690 1695 1700 1705 

GGA TGG CAG GCC ATG TTC CGC TGG ATG CGC TTC CAT GAC CTC GGA TTG 5308 
Gly Trp Gin Ala Met Phe Arg Trp Met Arg Phe His Asp Leu Gly Leu 
1710 1715 1720 

TGG ACA GGA GAT CGC GAT CTT CTG CCC GAA TTC GTA AAT GAT GAT GGC 5356 
Trp Thr Gly Asp Arg Asp Leu Leu Pro Glu Phe Val Asn Asp Asp Gly 
1725 1730 1735 

GTC TAAGGACGCT ACATCAAGCG TGGATGGCGC TAGTGGCGCT GGTCAGTTGG 5409 
Val 
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Table 4. The amino acid sequence deduced from nucleotides 5346 through 
6935 of th Norwalk virus genome shown in Table 2. 

CGTAA ATG ATG ATG GOG TCT AAG GAG GCT ACA TCA AGC GTG GAT GGC 5387 
Met Met Met Ala Ser Lys Asp Ala Thr Ser Ser Val ABp Gly 
15 10 

GCT AGT GGC GCT GGT GAG TTG GTA CCG GAG GTT AAT GCT TCT GAC OCT 5435 
Ala Ser Gly Ala Gly Gin Leu Val Pro Glu Val Asn Ala Ser Asp Pro 
15 20 25 30 

CTT GCA ATG GAT CCT GTA GCA GGT TCT TCG ACA GCA GTC GOG ACT GCT 5483 
Leu Ala Met Asp Pro Val Ala Gly Ser Ser Thr Ala Val Ala Thr Ala 
35 40 45 

GGA CAA GTT AAT CCT ATT GAT CCC TGG ATA ATT AAT AAT TTT GTG CAA 5531 
Gly Gin Val Asn Pro lie Asp Pro Trp lie lie Asn Asn Phe Val Gin 
50 55 60 

GCC CCC GAA GOT GAA TTT ACT ATT TCC CCA AAT AAT ACC CCC GGT GAT 5579 
Ala Pro Gin Gly Glu Phe Thr lie Ser Pro Asn Asn Thr Pro Gly Asp 
65 70 75 

GTT TTG TTT GAT TTG AGT TTG GGT CCC CAT CTT AAT CCT TTC TTG CTC 5627 
Val Leu Phe Asp Leu Ser Leu Gly Pro His Leu Asn Pro Phe Leu Leu 
80 85 90 

CAT CTA TCA CAA ATG TAT AAT GGT TGG GTT GGT AAC ATG AGA GTC AGG 5675 
His Leu Ser Gin Met Tyr Asn Gly Trp Val Gly Asn Met Arg Val Arg 
95 100 105 110 

ATT ATG CTA GCT GGT AAT GCC TTT ACT GCG GGG AAG ATA ATA GTT TCC 5723 
He Met Leu Ala Gly Asn Ala Phe Thr Ala Gly Lys Xle He Val Ser 
115 120 125 

TGC ATA CCC CCT GGT TTT GGT TCA CAT AAT CTT ACT ATA GCA CAA GCA 5771 
Cys He Pro Pro Gly Phe Gly Ser His Asn Leu Thr He Ala Gin Ala 
130 135 140 

ACT CTC TTT CCA CAT GTG ATT GCT GAT GTT AGG ACT CTA GAC CCC ATT 5819 
Thr Leu Phe Pro His Val He Ala Asp Val Arg Thr Leu Asp Pro He 
145 150 155 

GAG GTG CCT TTG GAA GAT GTT AGG AAT GTT CTC TTT CAT AAT AAT GAT 5867 
Glu Val Pro Leu Glu Asp Val Arg Asn Val Leu Phe His Asn Asn Asp 
160 165 170 

AGA AAT CAA CAA ACC ATG CGC CTT GTG TGC ATG CTG TAC ACC CCC CTC 5915 
Arg Asn Gin Gin Thr Met Arg Leu Val Cys Met Leu Tyr Thr Pro Leu 
175 180 185 190 

CGC ACT GGT GGT GGT ACT GGT GAT TCT TTT GTA GTT GCA GGG CGA GTT 5963 
Arg Thr Gly Gly Gly Thr Gly Asp Ser Phe Val Val Ala Gly Arg Val 
195 200 205 

ATG ACT TGC CCC AGT CCT GAT TTT AAT TTC TTG TTT TTA GTC CCT CCT 6011 
Met Thr Cys Pro Ser Pro Asp Phe Asn Phe Leu Phe Leu Val Pro Pro 
210 215 220 

ACG GTG GAG GAG AAA ACC AGG CCC TTC ACA CTC CCA AAT CTG CCA TTG 6059 
Thr Val Glu Gin Lys Thr Arg Pro Phe Thr Leu Pro Asn Leu Pro Leu 
225 230 235 
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Table 4, continued 

AGT TCT CTG TCT AAC TCA CGT GCC CCT CTC CCA ATC AGT AGT ATG GGC 6107 

Ser Ser Leu Ser Asn Ser Arg Ala Pro Leu Pro lie Ser Ser Met Gly 
240 245 250 

ATT TCC CCA GAC AAT GTC CAG AGT GTG CAG TTC CAA AAT GGT CGG TGT 6155 
lie Ser Pro Asp Asn Val Gin Ser Val Gin Phe Gin Asn Gly Arg Cys 
255 260 265 270 

ACT CTG GAT GGC CGC CTG GTT GGC ACC ACC CCA GTT TCA TTG TCA CAT 6203 
Thr Leu Asp Gly Arg Leu Val Gly Thr Thr Pro Val Ser Leu Ser His 
275 280 285 

GTT GCC AAG ATA AGA GGG ACC TCC AAT GGC ACT GTA ATC AAC CTT ACT 6251 
Val Ala Lys He Arg Gly Thr Ser Asn Gly Thr Val He Asn Leu Thr 
290 295 300 

GAA TTG GAT GGC ACA CCC TTT CAC CCT TTT GAG GGC CCT GCC CCC ATT 6299 
Glu Leu Asp Gly Thr Pro Phe His Pro Phe Glu Gly Pro Ala Pro He 
3G5 310 315 

GGG TTT CCA GAC CTC GGT GGT TGT GAT TGG CAT ATC AAT ATG ACA CAG 6347 
Gly Phe Pro Asp Leu Gly Gly Cys Asp Trp His He Asn Met Thr Gin 
320 325 330 

TTT GGC CAT TCT AGC CAG ACC CAG TAT GAT GTA GAC ACC ACC CCT GAC 6395 
Phe Gly His Ser Ser Gin Thr Gin Tyr Asp Val Asp Thr Thr Pro Asp 
335 340 345 350 

ACT TTT GTC CCC CAT CTT GGT TCA ATT CAG GCA AAT GGC ATT GGC AGT 6443 
Thr Phe Val Pro His Leu Gly Ser He Gin Ala Asn Gly He Gly Ser 
355 360 365 

GGT AAT TAT GTT GGT GTT CTT AGC TGG ATT TCC CCC CCA TCA CAC CCG 6491 
Gly Asn Tyr Val Gly Val Leu Ser Trp He Ser Pro Pro Ser His Pro 
370 375 380 

TCT GGC TCC CAA GTT GAC CTT TGG AAG ATC CCC AAT TAT GGG TCA AGT 6539 
Ser Gly Ser Gin Val Asp Leu Trp Lys He Pro Asn Tyr Gly Ser Ser 
385 390 395 

ATT ACG GAG GCA ACA CAT CTA GCC CCT TCT GTA TAC CCC CCT GGT TTC 6587 
He Thr Glu Ala Thr His Leu Ala Pro Ser Val Tyr Pro Pro Gly Phe 
400 405 410 

GGA GAG GTA TTG GTC TTT TTC ATG TCA AAA ATG CCA GGT CCT GGT GCT 6635 
Gly Glu Val Leu Val Phe Phe Met Ser Lys Met Pro Gly Pro Gly Ala 
415 420 425 430 

TAT AAT TTG CCC TGT CTA TTA CCA CAA GAG TAC ATT TCA CAT CTT GCT 6683 
Tyr Asn Leu Pro Cys Leu Leu Pro Gin Glu Tyr He Ser His Leu Ala 
435 440 445 

AGT GAA CAA GCC CCT ACT GTA GGT GAG GCT GCC CTG CTC CAC TAT GTT 6731 
Ser Glu Gin Ala Pro Thr Val Gly Glu Ala Ala Leu Leu His Tyr Val 
450 455 460 

GAC CCT GAT ACC GGT CGG AAT CTT GGG GAA TTC AAA GCA TAC CCT GAT 6779 
Asp Pro Asp Thr Gly Arg Asn Leu Gly Glu Phe Lys Ala Tyr Pro Asp 
465 470 475 

GGT TTC CTC ACT TGT GTC CCC AAT GGG GCT AGC TCG GGT CCA CAA CAG 6827 
Gly Phe Leu Thr Cys Val Pro Asn Gly Ala Ser ser Gly Pro Gin Gin 
480 485 490 
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Table 4, continued 

CTG COG ATC AAT GGG GTC TTT GTC TTT GTT TCA TGG GTG TCC AGA TTT 6875 
Leu Pro lie Asn Gly Val Phe Val Phe Val Ser Trp Val Ser Arg Phe 
495 500 505 510 

TAT CAA TTA AAG CCT GTG GGA ACT GCC AGC TCG GCA AGA GGT AGG CTT 6923 
Tyr Gin Leu Lys Pro Val Gly Thr Ala Ser Ser Ala Arg Gly Arg Leu 
515 520 525 

GGT CTG CGC CGA TAATGGCCCA AGCCATAATT GGTGCAATTG CTGCTTCCAC 6975 
Gly Leu Arg Arg 
530 
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Table 5. The amino acid sequence deduced from nucleotides 693B through 
7573 of the Norwalk virus genome shown in Table 2* 



CCAGCTCGGC AAGAGGTAGG CTTGGTCTGC GCCGATA ATG GCC CAA GCC ATA ATT 6955 

Met Ala Gin Ala lie He 
1 5 

GGT GCA ATT GCT GCT TCC ACA GCA GGT AGT GCT CTG GGA GOG GGC ATA 7003 
Gly Ala He Ala Ala Ser Thr Ala Gly Ser Ala Leu Gly Ala Gly He 
10 15 20 

GAG GTT GGT GGC GAA GCG GCC CTC CAA AGC CAA AGG TAT CAA CAA AAT 7051 
Gin Val Gly Gly Glu Ala Ala Leu Gin Ser Gin Arg Tyr Gin Gin Asn 
25 30 35 

TTG CAA CTG CAA GAA AAT TCT TTT AAA CAT GAC AGG GAA ATG ATT GGG 7099 
Leu Gin Leu Gin Glu Aen Ser Phe Lys His Asp Arg Glu Met He Gly 
40 45 50 

TAT GAG GTT GAA GCT TCA AAT CAA TTA TTG GCT AAA AAT TTG GCA ACT 7147 
Tyr Gin Val Glu Ala Ser Asn Gin Leu Leu Ala Lys Abu Leu Ala Thr 
55 60 65 70 

AGA TAT TCA CTC CTC CGT GCT GGG GGT TTG ACC AGT GCT GAT GCA GCA 7195 
Arg Tyr Ser Leu Leu Arg Ala Gly Gly Leu Thr Ser Ala Asp Ala Ala 
75 80 85 

AGA TCT GTG GCA GGA GCT CCA GTC ACC CGC ATT GTA GAT TGG AAT GGC 7243 
Arg Ser Val Ala Gly Ala Pro Val Thr Arg He Val Asp Trp Asn Gly 
90 95 100 

GTG AGA GTG TCT GCT CCC GAG TCC TCT GCT ACC ACA TTG AGA TCC GGT 7291 
Val Arg Val Ser Ala Pro Glu Ser Ser Ala Thr Thr Leu Arg Ser Gly 
105 110 115 

GGC TTC ATG TCA GTT CCC ATA CCA TTT GCC TCT AAG CAA AAA CAG GTT 7339 
Gly Phe Met Ser Val Pro He Pro Phe Ala Ser Lys Gin Lys Gin Val 
120 125 130 

CAA TCA TCT GGT ATT AGT AAT CCA AAT TAT TCC CCT TCA TCC ATT TCT 7387 
Gin Ser Ser Gly He Ser Asn Pro ABn Tyr Ser Pro Ser Ser He Ser 
135 140 145 150 

CGA ACC ACT AGT TGG GTC GAG TCA CAA AAC TCA TCG AGA TTT GGA AAT 7435 
Arg Thr Thr Ser Trp Val Glu Ser Gin Asn Ser Ser Arg Phe Gly Asn 
155 160 165 

CTT TCT CCA TAG CAC GCG GAG GCT CTC AAT ACA GTG TGG TTG ACT CCA 7483 
Leu Ser Pro Tyr His Ala Glu Ala Leu Asn Thr Val Trp Leu Thr Pro 
170 175 180 

CCC GGT TCA ACA GCC TCT TCT ACA CTG TCT TCT GTG CCA CGT GGT TAT 7531 
Pro Gly Ser Thr Ala Ser Ser Thr Leu Ser Ser Val Pro Arg Gly Tyr 
185 190 195 

TTC AAT ACA GAC AGG TTG CCA TTA TTC GCA AAT AAT AGG CGA 7573 
Phe Asn Thr Asp Arg Leu Pro Leu Phe Ala Asn Asn Arg Arg 
200 205 210 
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Table 6. Primers used for detection of Norwalk -related 
virus by pcr 
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GAC 


ATT 


GTC 


TG 






6134 


P-72 


6296 


5' 


CAT 


TGG 


GTT 


TCC 


AGA 


CCT 


A 






6313 


P-63 


6511 


5' 


ATA 


ATT 


GGG 


GAT 


CTT 


CCA 


AA 






6530 


P-76* 


6095 


5' 


TAG 


TGG 


CAT 


GGG 


TAT 


TTC 








6114 


P-77 


6316 


5' 


TAT 


GCC 


AAT 


CAC 


AGC 


CAC 








6333 


P-64 


6491 


5' 


GTC 


TGG 


CTC 


CCA 


AGT 


TGA 


CC 






6510 


P-75 


6726 


5' 


CGG 


TAT 


CAG 


GGT 


CAA 


CAT 








6744 


P-74 


6707 


5' 


TGA 


GGC 


TGC 


CCT 


GCT 


CCA 








6724 


P-3 


7009 


5' 


CCA 


CCG 


CTG 


TCC 


GGG 


AGG 








7027 


P-36 
(New) # 


4475 


5' 


GTT 


GCT 


GTT 


GGC 


ATT 


AAC 


A 






4493 



*Based on KY89 sequence # Based HuVc Sapporo sequence 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

<i) APPLICANT: Matson, David O 
Betes, Mary K 
Jiang, Xi 
Graham, David Y 

(ii) TITLE OF INVENTION: Methods and Reagents to Detect and 
Characterize Norwalk and Related Viruses 

(iii) NUMBER OF SEQUENCES: 75 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Fulbright & Jaworski Patent Dept 

(B) STREET: 1301 McKinney, Suite 5100 

(C) CITY: Houston 

(D) STATE: Texas 
tE) COUNTRY: JJSA 
(F) ZIP: 77010-3095 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patent In Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY / AGENT INFORMATION: 

(A) NAME: Launer, Charlene A 

(B) REGISTRATION NUMBER: 33,035 

(C) REFERENCE/DOCKET NUMBER: D-5526 

(ix) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: 713-651-3634 
(B> TELEFAX: 713-651-5246 
(C) TELEX: Western Union 762829 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7753 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: CDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Norwalk virus 

(B) STRAIN: 8FIIa 

(C) INDIVIDUAL ISOLATE: 8FIIa 



(Vii) IMMEDIATE SOURCE: 

(B) CLONE: pUCNV-953 and its derivatives 

(ix) FEATURE: 
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(A) NAME/KEY: CDS 

(B) LOCATION: 146 -.5359 

(D) OTHER INFORMATION: /note* "The protein encoded by 

nucleotides 146 through 5359 is eventually cleaved 
to make at least a picornavirus 2c -like protein, a 
3C-like protease and an RNA~dependent RNA plymerase. 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 5346. .6935 

(D) OTHER INFORMATION: /note= -Nucleotides 5346 through 
5359 are used for coding two different amino acid 
sequences: the first is the amino acid is coded by 
nucleotide 146 through 5359, the second by nucleotides 
5346 through 6935. 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 6938.. 7573 

(xl) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

GGOGTCAAAA GACGTCGTTC CTACTGCTGC TAGCAGTGAA AATGCTAACA ACAATAGTAG 60 

TATTAAGTCT CGTCTATTGG CGAGACTCAA GGGTTCAGGT GGGGCTACGT CCCCACCCAA 120 

CTCGATAAAG ATAACCAACC AAGATATGGC TCTGGGGCTG ATTGGACAGG TCCCAGCGCC 180 

AAAGGCCACA TCCGTCGATG TCCCTAAACA ACAGAGGGAT AGACCACCAC GGACTGTTGC 240 

CGAAGTTCAA CAAAATTTGC GTTGGACTGA GAGACCACAA GACCAGAATG TTAAGACGTG 300 

GGATGAGCTT GACCACACAA CAAAACAACA GATACTTGAT GAACACGCTG AGTGGTTTGA 360 

TGCCGGTGGC TTAGGTCCAA GTACACTACC CACTAGTCAT GAACGGTACA CACATGAGAA 420 

TGATGAAGGC CACCAGGTAA AGTGGTCGGC TAGGGAAGGT GTAGACCTTG GCATATCCGG 480 

GCTCACGACG GTGTCTGGGC CTGAGTGGAA TATGTGCCCG CTACCACCAG TTGACCAAAG 540 

GAGCACGACA CCTGCAACTG AGCCCACAAT TGGTGACATG ATCGAATTCT ATGAAGGGCA 600 

CATCTATCAT TATGCTATAT ACATAGGTCA AGGCAAGACG GTGGGTGTAC ACTCCCCTCA 660 

&GCAGCCTTC TCAAXSSCGAr GGATCAC€AT~ACAGCCCATA TCAGCTTGGT~ GGCGAGTCTG 720 

TTATGTCCCA CAACCAAAAC AGAGGCTCAC ATACGACCAA CTCAAAGAAT TAGAAAATGA 780 

ACCATGGCCG TATGCCGCAG TCACGAACAA CTGCTTCGAA TTTTGTTGCC AGGTCATGTG 840 

CTTGGAAGAT ACTTGGTTGC AAAGGAAGCT CATCTCCTCT GGCCGGTTTT ACCACCCGAC 900 

CCAAGATTGG TCCCGAGACA CTCCAGAATT CCAACAAGAC AGCAAGTTAG AGATGGTTAG 960 

GGATGCAGTG CTAGCCGCTA TAAATGGGTT GGTGTCGCGG CCATTTAAAG ATCTTCTGGG 1020 

TAAGCTCAAA CCCTTGAACG TGCTTAACTT ACTTTCAAAC TGTGATTGGA CGTTCATGGG 1080 

GGTCGTGGAG ATGGTGGTCC TCCTTTTAGA ACTCTTTGGA ATCTTTTGGA ACCCACCTGA 1140 

TGTTTCCAAC TTTATAGCTT CACTCCTGCC AGATTTCCAT CTACAGGGCC CCGAGGACCT 1200 

TGCCAGGGAT CTCGTGCCAA TAGTATTGGG GGGGATCGGC TTAGCCATAG GATTCACCAG 1260 

AGACAAGGTA AGTAAGATGA TGAAGAATGC TGTTGATGGA CTTCGTGCGG CAACCCAGCT 1320 

CGGTCAATAT GGCCTAGAAA TATTCTCATT ACTAAAGAAG TACTTCTTCG GTGGTGATCA 1380 

AACAGAGAAA ACCCTAAAAG ATATTGAGTC AGCAGTTATA GATATGGAAG TACTATCATC 1440 

TACATCAGTG ACTCAGCTCG TGAGGGACAA ACAGTCTGCA CGGGCTTATA TGGCCATCTT 1500 

AGATAATGAA GAAGAAAAGG CAAGGAAATT ATCTGTCAGG AATGCCGACC CACACGTAGT 1560 

ATCCTCTACC AATGCTCTCA TATCCCGGAT CTCAATGGCT AGGGCTGCAT TGGCCAAGGC 1620 

TCAAGCTGAA ATGACCAGCA GGATGCGTCC TGTGGTCATT ATGATGTGTG GGCCCCCTGG 1680 

TATAGGTAAA ACCAAGGCAG CAGAACATCT GGCTAAACGC CTAGCCAATG AGATACGGCC 1740 

TGGTGGTAAG GTTGGGCTGG TCCCACGGGA GGCAGTGGAT CATTGGGATG GATATCACGG 1800 

AGAGGAAGTG ATGCTGTGGG ACGACTATGG AATGACAAAG ATACAGGAAG ACTGTAATAA 1860 

ACTGCAAGCC ATAGCCGACT CAGCCCCCCT AACACTCAAT TGTGACCGAA TAGAAAACAA 1920 

GGGAATGCAA TTTGTGTCTG ATGCTATAGT CATCACCACC AATGCTCCTG GCCCAGCCCC 1980 

AGTGGACTTT GTCAACCTCG GGCCTGTTTG CCGAAGGGTG GACTTCCTTG TGTATTGCAC 2040 

GGCACCTGAA GTTGAACACA CGAGGAAAGT CAGTCCTGGG GACACAACTG CACTGAAAGA 2100 

CTGCTTCAAG CCCGATTTCT CACATCTAAA AATGGAGTTG GCTCCCCAAG GGGGCTTTGA 2160 

TAACOKAGGG AATACCCCGT^TTGGTAAGGG" TGTGATGAAG CCCACCRCCA TAAACAGGCT 2220 

GTTAATCCAG GCTGTAGCCT TGACGATGGA GAGACAGGAT GAGTTCCAAC TCCAGGGGCC 2280 

TACGTATGAC TTTGATACTG ACAGAGTAGC TGCGTTCACG AGGATGGCCC GAGCCAACGG 2340 

GTTGGGTCTC ATATCCATGG CCTCCCTAGG CAAAAAGCTA CGCAGTGTCA CCACTATTGA 2400 

&GGATTAAAG AATGCTCTAT CAGGCTATAA AATATCAAAA TGCAGTATAC AATGGCAGTC 2460 

MGGGTGTAC ATT&TAGA&T C&GATGGTGC C&GTGT&C&A AKAAAGAAG AC&&GC&&GC 2520 

TTTGACCCCT CTGCAGCAGA CAATTAACAC GGCCTCACTT GCCATCACTC GACTCAAAGC 2580 
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AGCTAGGGCT GTGGCATAOG CTTCATGTTT CCAGTCCGCC ATAACTACCA TACTACAAAT 2640 

GGCGGGATCT GCGCTCGTTA TTAATCGAGC GGTCAAGCGT ATGTTTGGTA CCCGTACAGC 2700 

AGCCATGGCA TTAGAAGGAC CTGGGAAAGA ACATAATTGC AGGGTCCATA AGGCTAAGGA 2760 

AGCTGGAAAG GGGCCCATAG GTCATGATGA CATGGTAGAA AGGTTTGGCC TATGTGAAAC 2820 

TGAAGAGGAG GAGAGTGAGG ACCAAATTCA AATGGTACCA AGTGATGCCG TCCCAGAAGG 2880 

AAAGAACAAA GGCAAGACCA AAAAGGGACG TGGTCGCAAA AATAACTATA ATGCATTCTC 2940 

TCGCCGTGGT CTGAGTGATG AAGAATATGA AGAGTACAAA AAGATCAGAG AAGAAAAGAA 3000 

TGGCAATTAT AGTATACAAG AATACTTGGA GGACCGCCAA CGATATGAGG AAGAATTAGC 3060 

AGAGGTACAG GCAGGTGGTG ATGGTGGCAT AGGAGAAACT GAAATGGAAA TCOGTCACAG 3120 

GGTCTTCTAT AAATCCAAGA GTAAGAAACA CCAACAAGAG CAACGGCGAC AACTTGGTCT 3180 

AGTGACTGGA TCAGACATCA GAAAACGTAA GCCCATTGAC TGGACCCCGC CAAAGAATGA 3240 

ATGGGCAGAT GATGACAGAG AGGTGGATTA TAATGAAAAG ATCAATTTTG AAGCTCCCCC 3300 

GACACTATGG AGCCGAGTCA CAAAGTTTGG ATCAGGATGG GGCTTTTGGG TCAGCCCGAC 3360 

AGTGTTCATC ACAACCACAC ATGTAGTGCC AACTGGTGTG AAAGAATTCT TTGGTGAGCC 3420 

CCTATCTAGT ATAGCAATCC ACCAAGCAGG TGAGTTCACA CAATTCAGGT TCTCAAAGAA 3480 

AATGCGCCCT GACTTGACAG GTATGGTCCT TGAAGAAGGT TGCCCTGAAG GGACAGTCTG 3540 

CTCAGTCCTA ATTAAACGGG ATTCGGGTGA ACTACTTCCG CTAGCCGTCC GTATGGGGGC 3600 

TATTGCCTCC ATGAGGATAC AGGGTCGGCT TGTCCATGGC CAATCAGGGA TGTTACTGAC 3660 

AGGGGCCAAT GGAAAGGGGA TGGATCTTGG CACTATACCA GGAGACTGCG GGGCACCATA 3720 

CGTCCACAAG CGCGGGAATG ACTGGGTTGT GTGTGGAGTC CACGCTGCAG CCACAAAGTC 3780 

AGGCAACACC GTGGTCTGCG CTGTACAGGC TGGAGAGGGC GAAACCGCAC TAGAAGGTGG 3840 

AGACAAGGGG CATTATGCCG GCCACGAGAT TGTGAGGTAT GGAAGTGGCC GAGCACTGTC 390C 

AACTAAAACA AAATTCTGGA GGTCCTCCCC AGAACCACTG CCCCCCGGAG TATATGAGCC 3960 

AGCATACCTG GGGGGCAAGG ACCCCOGTGT ACAGAATGGC CCATCCCTAC AACAGGTACT 4020 

ACGTGACCAA CTGAAACCCT TTGOGGACCC COGCGGCOGC ATGCCTGAGC CTGGCCTACT 4080 

GGAGGCTGCG GTTGAGACTG TAACATCCAT GTTAGAACAG ACAATGGATA CCCCAAGCCC 4140 

GTGGTCTTAC GCTGATGCCT GCCAATCTCT TGACAAAACT ACTAGTTCGG GGTACCCTCA 4200 

CCATAAAAGG AAGAATGATG ATTGGAATGG CACCACCTTC GTTGGAGAGC TCGGTGAGCA 4260 

AGCTGCAGAC GCCAAGAATA TGTATGAGAA TGCTAAACAT ATGAAACCCA TTTACACTGC 4320 

AGCCTTAAAA GATGAACTAG TCAAGCCAGA AAAGATTTAT CAAAAAGTCA AGAAGOGXCT 4380 

ACTATGGGGC GCCGATCTCG GAACAGTGGT CAGGGCCGCC CGGGCTTTTG GCCCATTTTG 4440 

TGACGCTATA AAATCACATG TCATCAAATT GCCAATAAAA GTTGGCATGA ACACAATAGA 4500 

AGATGGCCCC CTCATCTATG CTGAGCATGC TAAATATAAG AATCATTTTG ATGCAGATTA 4560 

TftCflfrHftTg^ gACTCAAC&C AAAATAG&CA AATTATGACA GAATCCTTCT CCATTATGTC 4620 

GCGCCTTACG GCCTCACCAG AATTGGCCGA GGTTGTGGCC CAAGATTTGC TAGCACCATC 4680 

TGAGATGGAT GTAGGTGATT ATGTCATCAG GGTCAAAGAG GGGCTGCCAT CTGGATTCCC 4740 

ATGTACTTCC CAGGTGAACA GCATAAATCA CTGGATAATT ACTCTCTGTG CACTGTCTGA 4800 

GGCCACTGGT TTATCACCTG ATGTGGTGCA ATCCATGTCA TATTTCTCAT TTTATGGTGA 4860 

TGATGAGATT GTGTCAACTG ACATAGATTT TGACCCAGCC OGCCTCACTC AAATTCTCAA 4920 

GGAATATGGC CTCAAACCAA CAAGGCCTGA CAAAACAGAA GGACGAATAC AAGTGAGGAA 4980 

AAATGTGGAT GGACTGGTCT TCTTGCGGCG CACCATTTCC CGTGATGCGG CAGGGTTCCA 5040 

AGGCAGGTTA GATAGGGCTT CGATTGAACG CCAAATCTTC TGGACCCGCG GGCCCAATGA 5100 

TTCAGATCCA TCAGAGACTC TAGTGCCACA CACTCAAAGA AAAATACAGT TGATTTCACT 5160 

TCTAGGGGAA GCTTCACTCC ATGGTGAGAA ATTTTACAGA AAGATTTCCA GCAAGGTCAT 5220 

ACATGAAATC AAGACTGGTG GATTGGAAAT GTATGTCCCA GGATGGCAGG CCATGTTCCG 5280 

CTGGATGCGC TTCCATGACC TCGGATTGTG GACAGGAGA7 CGCGATCTTC TGCCCGAATT 5340 

CGTAAATGAT GATGGCGTCT AAGGACGCTA CATCAAGCGT GGATGGCGCT AGTGGCGCTG 5400 

GTCAGTTGGT ACCGGAGGTT AATGCTTCTG ACCCTCTTGC AATGGATCCT GTAGCAGGTT 5460 

CTTCGACAGC AGTCGCGACT GCTGGACAAG TTAATCCTAT TGATCCCTGG ATAATTAATA 5520 

ATTTTGTGCA AGCCCCCCAA GGTGAATTTA CTATTTCCCC AAATAATACC CCOGGTGATG 5580 

TTTTGTTTGA TTTGAGTTTG GGTCCCCATC TTAATCCTTT CTTGCTCCAT CTATCACAAA 5640 

TGTATAATGG TTGGGTTGGT AACATGAGAG TCAGGATTAT GCTAGCTGGT AATGCCTTTA 5700 

CTGCGGGGAA GATAATAGTT TCCTGCATAC CCCCTGGTTT TGGTTCACAT AATCTTACTA 5760 

TAGCACAAGC AACTCTCTTT CCACATGTGA TTGCTGATGT TAGGACTCTA GACCCCATTG 5820 

AGGTGCCTTT GGAAGATGTT AGGAATGTTC TCTTTCATAA TAATGATAGA AATCAACAAA 5880 

CCATGCGCCT TGTGTGCATG CTGTACACCC CCCTCCGCAC TGGTGGTGGT ACTGGTGATT 5940 

CTTTTGTAGT TGCAGGGCGA GTTATGACTT GCCCCAGTCC TGATTTTAAT TTCTTGTTTT 6000 

TAGTCCCTCC TACGGTGGAG CAGAAAACCA GGCCCTTCAC ACTCCCAAAT CTGCCATTGA 6060 

GTTCTCTGTC TAACTCACGT GCCCCTCTCC CAATCAGTAG MTGGGCATT TCCCGAGACA 6120 

ATGTCCAGAG TGTGCAGTTC CAAAATGGTC GGTGTACTCT GGATGGCCGC CTGGTTGGCA 6180 

CCACCCCAGT TTCATTGTCA CATGTTGCCA AGATAAGAGG GACCTCCAAT GGCACTGTAA 6240 

TCAACCTTAC TGAATTGGAT GGCACACCCT TTCACCCTTT TGAGGGCCCT GCCCCCATTG 6300 

GGTTTCCAGA CCTCGGTGGT TGTGATTGGC ATATCAATAT GACACAGTTT GGCCATTCTA 6360 

GCCAGACCCA GTATGATGTA G&C&CC&CCC CTGACACTTT TGTCCCCCAT CTTGGTTC^A 6420 

TTCAGGCAAA TGGCATTCGC ^GTGGTAATT ATGTTGGTGT TCTTAGCTGG ATTTCCCCCC 6480 
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CATCACACCC GTCTGGCTCC CAAGTTGACC TTTGGAAGAT CCCCAATTAT GGGTCAAGTA 6540 

TTACGGAGGC AACACATCTA GCCCCTTCTG TATACCCCCC TGGTTTCGGA GAGGTATTGG 6600 

TCTTTTTCAT GTCAAAAATG CCAGGTCCTG GTGCTTATAA TTTGCCCTGT CTATTACCAC 6660 

AAGAGTACAT TTCACATCTT GCTAGTGAAC AAGCCCCTAC TGTAGGTGAG GCTGCCCTGC 6720 

TCCACTATGT TGACCCTGAT ACCGGTCGGA ATCTTGGGGA ATTCAAAGCA TACCCTGATG 6780 

GTTTCCTCAC TTGTGTCCCC AATGGGGCTA GCTCGGGTCC ACAACAGCTG CCGATCAATG 6840 

GGGTCTTTGT CTTTGTTTCA TGGGTGTCCA GATTTTATCA ATTAAAGCCT GTGGGAACTG 6900 

CCAGCTCGGC AAGAGGTAGG CTTGGTCTGC GCCGATAATG GCCCAAGCCA TAATTGGTGC 6960 

AATTGCTGCT TCCACAGCAG GTAGTGCTCT GGGAGCGGGC ATACAGGTTG GTGGCGAAGC 7020 

GGCCCTCCAA AGCCAAAGGT ATCAACAAAA TTTGCAACTG CAAGAAAATT CTTTTAAACA 7080 

TGACAGGGAA ATGATTGGGT ATCAGGTTGA AGCTTCAAAT CAATTATTGG CTAAAAATTT 7140 

GGCAACTAGA TATTCACTCC TCCGTGCTGG GGGTTTGACC AGTGCTGATG CAGCAAGATC 7200 

TGTGGCAGGA GCTCCAGTCA CCCGCATTGT AGATTGGAAT GGOGTGAGAG TGTCTGCTCC 7260 

CGAGTCCTCT GCTACCACAT TGAGATCOGG TGGCTTCATG TGAGTTCCCA TACCATTTGC 7320 

CTCTAAGCAA AAACAGGTTC AATCATCTGG TATTAGTAAT CCAAATTATT CCCCTTCATC 7380 

CATTTCTCGA ACCACTAGTT GGGTCGAGTC ACAAAACTCA TCGAGATTTG GAAATCTTTC 7440 

TCCATACCAC GCGGAGGCTC TCAATACAGT GTGGTTGACT CCACCCGGTT CAACAGCCTC 7500 

TTCTACACTG TCTTCTGTGC CACGTGGTTA TTTCAATACA GACAGGTTGC CATTATTCGC 7560 

AAATAATAGG CGATGATGTT GTAATATGAA ATGTGGGGAT CATATTCATT TAATTAGGTT 7620 

TAATTAGGTT TAATTTGATG TTAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 7680 

AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 7740 

AAAAAAAAAA AAA 7753 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1738 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Ala Leu Gly Leu lie Gly Gin Val Pro Ala Pro Lys Ala Thr Ser 
15 10 15 

Val Asp Val Pro Lys Gin Gin Arg Asp Arg Pro Pro Arg Thr Val Ala 
20 25 30 

Glu Val Gin Gin Asn Leu Arg Trp Thr Glu Arg Pro Gin Asp Gin Asn 
35 40 45 

Val Lys Thr Trp Asp Glu Leu Asp His Thr Thr Lys Gin Gin He Leu 
50 55 60 

Asp Glu His Ala Glu Trp Phe Asp Ala Gly Gly Leu Gly Pro Ser Thr 
65 70 75 80 

Leu Pro Thr Ser His Glu Arg Tyr Thr His Glu Asn Asp Glu Gly His 
85 90 95 

Gin Val Lys Trp Ser Ala Arg Glu Gly Val Asp Leu Gly He Ser Gly 
100 105 HO 

Leu Thr Thr Val Ser Gly Pro Glu Trp Asn Met Cys Pro Leu Pro Pro 
115 120 125 

Val Asp Gin Arg Ser Thr Thr Pro Ala Thr Glu Pro Thr He Gly Asp 
130 135 140 

mt He Glu Phe Tyr Glu Gly His He Tyr His Tyr Ala He Tyr He 
145 150 155 160 



WO SW/05700 PCT/US93/08447 

64 



Gly Gin Gly Lys Thr Val Gly Val His Ser Pro Gin Ala Ala Phe S r 
165 170 175 

lie Thr Arg He Thr He Gin Pro He Ser Ala Trp Trp Arg Val Cys 
180 185 190 

Tyr Val Pro Gin Pro Lys Gin Arg Leu Thr Tyr Asp Gin Leu Lys Glu 
195 200 205 

Leu Glu Asn Glu Pro Trp Pro Tyr Ala Ala Val Thr Asn Aon Cys Phe 
210 215 220 

Glu Phe Cys Cys Gin Val Met Cys Leu Glu Asp Thr Trp Leu Gin Arg 
225 230 235 240 

Lys Leu He Ser Ser Gly Arg Phe Tyr His Pro Thr Gin Asp Trp Ser 
245 250 255 

Arg Asp Thr Pro Glu Phe Gin Gin Asp Ser Lys Leu Glu Met Val Arg 
260 265 270 

Asp Ala Val Leu Ala Ala He Asn Gly Leu Val Ser Arg Pro Phe Lys 
275 280 285 

Asp Leu Leu Gly Lys Leu Lys Pro Leu Asn Val Leu Asn Leu Leu Ser 
290 295 300 

Asn Cys Asp Trp Thr Phe Met Gly Val Val Glu Met Val Val Leu Leu 
305 310 315 320 

Leu Glu Leu Phe Gly He Phe Trp Asn Pro Pro Asp Val Ser Asn Phe 
325 330 335 

He Ala Ser Leu Leu Pro Asp Phe His Leu Gin Gly Pro Glu Asp Leu 
340 345 350 

Ala Arg Asp Leu Val Pro He Val Leu Gly Gly He Gly Leu Ala He 
355 360 365 

Gly Phe Thr Arg Asp Lys Val Ser Lys Met Met Lys Asn Ala Val Asp 
370 375 380 

Gly Leu Arg Ala Ala Thr Gin Leu Gly Gin Tyr Gly Leu Glu He Phe 
385 390 395 400 

Ser Leu Leu Lys Lys Tyr Phe Phe Gly Gly Asp Gin Thr Glu Lys Thr 
405 410 415 

Leu Lys Asp He Glu Ser Ala Val He Asp Met Glu Val Leu Ser Ser 
420 425 430 

Thr Ser Val Thr Gin Leu Val Arg Asp Lys Gin Ser Ala Arg Ala Tyr 
435 440 445 

Met Ala He Leu Asp Asn Glu Glu Glu Lys Ala Arg Lys Leu Ser Val 
450 455 460 

Arg Asn Ala Asp Pro His Val Val Ser Ser Thr Asn Ala Leu He Ser 
465 470 475 480 

Arg He Ser Met Ala Arg Ala Ala Leu Ala Lys Ala Gin Ala Glu Met 
485 490 495 

Thr Ser Arg Het hrg Pro Val Val He Met net Cys Gly Pro Pro Gly 
500 505 510 
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lie Gly Lys Thr Lys Ala Ala 61u Bia Leu Ala Lys Arg Leu Ala Aan 
515 520 525 

Glu lie Arg Fro Gly Gly Lys Val Gly Leu Val Pro Arg Glu Ala Val 
530 535 540 

Asp His Trp Asp Gly Tyr His Gly Glu Glu Val Met Leu Trp Asp Asp 
545 550 555 560 

Tyr Gly Met Thr Lys He Gin Glu Asp Cys Asn Lys Leu Gin Ala He 
565 570 575 

Ala Asp Ser Ala Pro Leu Thr Leu Asn Cys Asp Arg He Glu Asn Lys 
580 585 590 

Gly Met Gin Phe Val Ser Asp Ala He Val He Thr Thr Asn Ala Pro 
595 600 605 

Gly Pro Ala Pro Val Asp Phe Val Asn Leu Gly Pro Val Cys Arg Arg 
610 615 620 

Val Asp Phe Leu Val Tyr Cys Thr Ala Pro Glu Val Glu His Thr Arg 
625 630 635 640 

Lys Val Ser Pro Gly Asp Thr Thr Ala Leu Lys Asp Cys Phe Lys Pro 
645 650 655 

Asp Phe Ser His Leu Lys Met Glu Leu Ala Pro Gin Gly Gly Phe Asp 
660 665 670 

Asn Gin Gly Asn Thr Pro Phe Gly Lys Gly Val Met Lys Pro Thr Thr 
675 680 685 

He Asn Arg Leu Leu He Gin Ala Val Ala Leu Thr Met Glu Arg Gin 
690 695 700 

Asp Glu Phe Gin Leu Gin Gly Pro Thr Tyr Asp Phe Asp Thr Asp Arg 
705 710 715 720 

Val Ala Ala Phe Thr Arg Met Ala Arg Ala Asn Gly Leu Gly Leu He 
725 730 735 

Ser Met Ala Ser Leu Gly Lys Lys Leu Arg Ser Val Thr Thr He Glu 
740 745 750 

Gly Leu Lys Asn Ala Leu Ser Gly Tyr Lys He Ser Lys Cys Ser He 
755 760 765 

Gin Trp Gin Ser Arg Val Tyr He He Glu Ser Asp Gly Ala Ser Val 
770 775 780 

Gin He Lys Glu Asp Lys Gin Ala Leu Thr Pro Leu Gin Gin Thr He 
785 790 795 800 

Asn Thr Ala Ser Leu Ala He Thr Arg Leu Lys Ala Ala Arg Ala Val 
805 810 815 

Ala Tyr Ala Ser Cys Phe Gin Ser Ala He Thr Thr He Leu Gin Met 
820 825 830 

Ala Gly Ser Ala Leu Val He Asn Arg Ala Val Lys Arg Met Phe Gly 
835 840 845 



Thr Arg Thr Ala Ala E4et Ala Leu Glu Gly Pro Gly Lys Glu His Asn 
850 855 860 
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Cys Arg Val His Lye Ala Ly 8 Glu Ala Gly Lys Gly Pro lie Gly Hie 
865 870 

Asp Asp Met: val Glu Arg Phe Gly Leu Cys Glu Thr Glu Glu Glu Glu 



885 890 

ser Glu Asp Gin He Gin Met Val Pro Ser Asp Ala Val Pro Glu Gly 

900 905 



Lys Asn Lys Gly Lys Thr Lys Lys Gly Arg Gly Arg Lys Asn Asn Tyr 

915 920 
Asn Ala Phe Ser Arg Arg Gly Leu Ser Asp Glu Glu Tyr Glu Glu Tyr 

930 935 
Lya Lys lie Arg Glu Glu Lys Asn Gly Asn Tyr Ser lie Gin Glu Tyr 
945 950 

Leu Glu Asp Arg Gin Arg Tyr Glu Glu Glu Leu Ala Glu Val Gin Ala 

965 y/u 

Gly Gly Asp Gly Gly He Gly Glu Thr Glu Met Glu lie Arg His Arg 

980 985 

Val Phe Tyr Lys Ser Lys Ser Lys Lys His Gin Gin Glu Gin Arg Arg 

995 10QO -Lvwj 

Gin Leu Gly Leu Val Thr Gly Ser Asp He Arg Lys Arg Lys Pro lie 

1010 1015 102 

Asp Trp Thr Pro Pro Lys Asn Glu Trp Ala Asp Asp Asp Arg Glu Va^ 

1025 1030 

Hp Tyr Asn Glu Ly^Ile Asn Phe Glu AlaPro Pro Thr Leu Tr^Ser 

Arg Val Thr Lys Phe Gly Ser Gly Trp Gly Phe Trp Val ser Pro Thr 

1060 1065 

Val Phe lie Thr Thr Thr His Val Val Pro Thr Gly Val Lys Glu Phe 

1075 1080 1085 

Phe Gly Q Glu Pro Leu Ser Senile Ala lie His Gln^Ala Gly Glu Phe 

Thr Gin Phe Arg Phe Ser Lys Lys Met Arg Pro Asp Leu Thr Gly Met Q 
1105 1110 

val Leu Glu Glu Gly Cys Pro Glu Gly Thr Val Cys Ser Val Leu He 

1125 1130 

Lys Arg Asp Ser Gly Glu Leu Leu Pro Leu Ala Val Arg Met Gly Ala 
1140 1145 

lie Ala Ser Met Arg He Gin Gly Arg Leu Val His Gly Gin Ser Gly 

1155 1160 1165 

Met Leu Leu Thr Gly Ala Asn Ala Lys Gly Met Asp Leu Gly Thr He 

i-i 1175 11BO 

Lys Arg C . 

1185 "90 1195 "DO 



1170 II 75 
Pro Gly Asp Cys Gly Ala Pro Tyr Val His Lys Arg Gly Asn Asp Trp £ 



Val Val cys Gly Val His Ala Ala Ala Thr Lys Ser Gly Asn Thr val 
1205 1210 
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val cys Ala Val Gin Ala Gly Glu Gly^Glu Thr Ala Leu Glumly Gly 



1235 



Pro Ala Leu Ser Thr Lys Thr Lye Phe Trp Arg Ser Ser Pro Glu Pro 
1250 1255 " bu 

!*„ Pro Pro Gly Val Tyr^Glu Pro Ala Tyr Le^Gly Gly Lys Asp PrO Q 



1285 "90 
Lys Pro Phe Ala Asp Pro Arg Gly Arg Met Pro Glu Pro Gly^eu Leu 



1220 

Asp Lys GlyHis Tyr Ala Gly His^Glu He Val Arg Tyr^Gly Ser Gly 

Lys •. 
Tyr < 

1265 1270 

Arg Val Gin Asn GlyPro Ser Leu Gin OlnVal Leu Arg Asp Gl^Leu 

Arg Met Pro Glu Pro Gly I 
1300 13° 5 1310 

Glu Ala Ala Val Glu Thr Val Thr Ser Met Leu Glu Gin Thr Met Asp 

1315 132° 1325 

Thr Proper Pro Trp Ser Tyr^Ala Asp Ala Cys GlnStt Leu Asp Lye 

Thr Thr Ser Ser Gly Tyr Pro His His Lys Arg Lys Asn Asp Asp Trp 
1345 1350 1355 

Asn Gly Thr Thr Phe Val Gly Glu Leu Gly Glu Gin Ala Ala His Ala 
1365 1370 

Asn Asn Met Tyr Glu Asn Ala Lys His Met Lys Pro lie Tyr Thr Ala 
1380 1385 

Ala Leu Lys Asp Glu Leu Val Lys Pro Glu Lys He Tyr Gin Lys Val 

1395 1400 

Lys Lys Arg Leu Leu Trp Gly Ala Asp Leu Gly Thr Val Val Arg Ala 

1410 1415 
Ala Arg Ala Phe Gly Pro Phe Cys Asp Ala lie Lys Ser His Val 11^ 
1425 1430 

Lys Leu Pro He Ly^Val Gly Met Asn Thrll. Glu Asp Gly ProLau 

lie Tyr Ala Glu His Ala Lys Tyr Lys Asn His Phe Asp Ala Asp Tyr 
1460 1465 J.-*'" 

Thr Ala Trp Asp Ser Thr Gin Asn Arg Gin lie Met Thr Glu Ser Phe 
1475 1480 ihod 

Ser lie Met Ser Arg Leu Thr Ala Ser Pro Glu Leu Ala Glu Val Val 
1490 1495 1500 

Ala Gin Asp Leu Leu Ala Pro Ser Glu Met Asp Val Gly Asp Tyr Va^ 
1505 1510 l 51 * 

lie Arg Val Lys Glu Gly Leu Pro Ser Gly Phe Pro Cys Thr Ser Gin 

1525 1530 13,53 

Val Asn ser He Asn His Trp He He Thr Leu Cys Ala Leu Ser Glu 
1540 1545 

Ma Thr Gly Leu Ser Pro fcsp Val Val Gin Ser Met Ser Tyr Phe Ser 
1555 1560 
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Phe Tyr Gly Asp Asp Glu lie Val Ser Thr Asp lie Asp Phe Asp Pro 
1570 1575 1580 

Ala Arg Leu Thr Gin lie Leu Lys Glu Tyr Gly Leu Lys Pro Thr Arg 
1585 1590 1595 1600 

Pro Asp Lys Thr Glu Gly Pro He Gin Val Arg Lys Asn Val Asp Gly 
1605 1610 1615 

Leu Val Phe Leu Arg Arg Thr He Ser Arg Asp Ala Ala Gly Phe Gin 
1620 1625 1630 

Gly Arg Leu Asp Arg Ala Ser He Glu Arg Gin He Phe Trp Thr Arg 
1635 1640 1645 

Gly Pro Aen His Ser Asp Pro Ser Glu Thr Leu Val Pro His Thr Gin 
1650 1655 1660 

Arg Lys He Gin Leu He Ser Leu Leu Gly Glu Ala Ser Leu His Gly 
1665 1670 1675 1680 

Glu Lys Phe Tyr Arg Lys He Ser Ser Lys Val He His Glu He Lys 
1685 1690 1695 

Thr Gly Gly Leu Glu Met Tyr Val Pro Gly Trp Gin Ala Met Phe Arg 
1700 1705 1710 

Trp Met Arg Phe His Asp Leu Gly Leu Trp Thr Gly Asp Arg Asp Leu 
1715 1720 1725 

Leu Pro Glu Phe Val Asn Asp Asp Gly Val 
1730 1735 

(2) INFORMATION FOR SEQ ID NO; 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 530 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met Met Met Ala Ser Lys Asp Ala Thr Ser Ser Val Asp Gly 
15 10 

Ala Ser Gly Ala Gly Gin Leu Val Pro Glu Val Asn Ala Ser Asp Pro 
15 20 25 30 

Leu Ala Met Asp Pro Val Ala Gly Ser Ser Thr Ala Val Ala Thr Ala 
35 40 45 

Gly Gin Val Asn Pro He Asp Pro Trp lie He Asn Asn Phe Val Gin 
50 55 60 

Ala Pro Gin Gly Glu Phe Thr He Ser Pro Asn Asn Thr Pro Gly Asp 
65 70 75 

Val Leu Phe Asp Leu Ser Leu Gly Pro His Leu Asn Pro Phe Leu Leu 
80 85 90 

His Leu ser Gin Met Tyr Asn Gly Trp Val Gly Asn Met Arg Val Arg 
95 100 105 110 
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lie Met Leu Ala Gly Asn Ala Phe Thr Ala Gly Lys lie lie Val Ser 
115 120 125 

Cys He Pro Pro Gly Phe Gly Ser His Asn Leu Thr He Ala Gin Ala 
130 135 140 

Thr Leu Phe Pro His Val He Ala Asp Val Arg Thr Leu Asp Pro He 
145 150 155 

Glu Val Pro Leu Glu Asp Val Arg Asn Val Leu Phe His Asn Asn ABp 
160 165 170 

Arg Asn Gin Gin Thr Met Arg Leu Val Cys Met Leu Tyr Thr Pro Leu 
175 180 185 190 

Arg Thr Gly Gly Gly Thr Gly Asp Ser Phe Val Val Ala Gly Arg Val 
195 200 205 

Met Thr Cys Pro Ser Pro Asp Phe Asn Phe Leu Phe Leu Val Pro Pro 
210 215 220 

Thr Val Glu Gin Lys Thr Arg Pro Phe Thr Leu Pro Asn Leu Pro Leu 
225 230 235 

Ser Ser Leu Ser Asn Ser Arg Ala Pro Leu Pro He Ser Ser Met Gly 
240 245 250 

He Ser Pro Asp Asn Val Gin Ser Val Gin Phe Gin Asn Gly Arg Cys 
255 260 265 270 

Thr Leu Asp Gly Arg Leu Val Gly Thr Thr Pro Val Ser Leu Ser His 
275 280 285 

Val Ala Lys He Arg Gly Thr Ser Asn Gly Thr Val He Asn Leu Thr 
290 295 300 

Glu Leu Asp Gly Thr Pro Phe His Pro Phe Glu Gly Pro Ala Pro He 
305 310 315 

Gly Phe Pro Asp Leu Gly Gly Cys Asp Trp His He Asn Met Thr Gin 
320 325 330 

Phe Gly His Ser Ser Gin Thr Gin Tyr Asp Val Asp Thr Thr Pro Asp 
335 340 345 350 

Thr Phe Val Pro His Leu Gly Ser He Gin Ala Asn Gly He Gly Ser 
355 360 365 

Gly Asn Tyr Val Gly Val Leu Ser Trp He Ser Pro Pro Ser His Pro 
370 375 380 

Ser Gly Ser Gin Val Asp Leu Trp Lys He Pro Asn Tyr Gly Ser Ser 
385 390 395 

He Thr Glu . Ala Thr His Leu Ala Pro Ser Val Tyr Pro Pro Gly Phe 
400 405 410 

Gly Glu Val Leu Val Phe Phe Met Ser Lys Met Pro Gly Pro Gly Ala 
415 420 425 430 

Tyr Asn Leu Pro Cys Leu Leu Pro Gin Glu Tyr He Ser His Leu Ala 
435 440 445 



Ser Glu Gin Ala Pro Thr Val Gly Glu Ala Ala Leu Leu His Tyr Val 
450 455 460 
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ABp Pro Asp Thr Gly Arg Asn Leu Gly Glu Phe Lys Ala Tyr Pro Asp 
465 470 475 

Gly Phe Leu Thr Cys Val Pro Asn Gly Ala Ser Ser Gly Pro Gin Gin 
480 485 490 

Leu Pro He Asn Gly Val Phe Val Phe Val Ser Trp Val Ser Arg Phe 
495 500 505 510 

Tyr Gin Leu Lys Pro Val Gly Thr Ala Ser Ser Ala Arg Gly Arg Leu 
515 520 525 

Gly Leu Arg Arg 
530 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 212 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Ala Gin Ala He He 
1 5 

Gly Ala He Ala Ala Ser Thr Ala Gly Ser Ala Leu Gly Ala Gly He 
10 15 20 

Gin Val Gly Gly Glu Ala Ala Leu Gin Ser Gin Arg Tyr Gin Gin Asn 
25 30 35 

Leu Gin Leu Gin Glu Asn Ser Phe Lys His Asp Arg Glu Met He Gly 
40 45 50 

Tyr Gin Val Glu Ala Ser Asn Gin Leu Leu Ala Lys Asn Leu Ala Thr 
55 60 65 70 

Arg Tyr Ser Leu Leu Arg Ala Gly Gly Leu Thr Ser Ala Asp Ala Ala 
75 80 85 

Arg Ser Val Ala Gly Ala Pro Val Thr Arg He Val Asp Trp Asn Gly 
90 95 100 

Val Arg Val Ser Ala Pro Glu Ser Ser Ala Thr Thr Leu Arg Ser Gly 
105 110 115 

Gly Phe Met Ser Val Pro lie Pro Phe Ala Ser Lys Gin Lys Gin Val 
120 125 130 

Gin Ser Ser Gly He Ser Asn Pro Asn Tyr Ser Pro Ser Ser He Ser 
135 140 145 150 

Arg Thr Thr Ser Trp Val Glu Ser Gin Asn Ser Ser Arg Phe Gly Asn 
155 160 165 

Leu Ser Pro Tyr His Ala Glu Ala Leu Asn Thr Val Trp Leu Thr Pro 
170 175 180 

Pro Gly Ser Thr Ala Ser Ser Thr Leu Ser Ser Val Pro Arg Gly Tyr 
185 190 195 
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Phe Asn Thr Asp Arg Leu Pro Leu Phe Ala Asn Asn Arg Arg 
200 205 210 



(2) INFORMATION FOR SEQ ID 110:5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 551 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNKSS : double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: human calici virus Sapporo 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..549 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



TGTGATGCTG 


CCACCACGCT 


TATAGCCACC 


GCGGCTTTTA 


AGGCCGTGGC 


TACNAGGCTA 


60 


CAGGTGGTGA 


CACCAATGAC 


ACCAGTTGCT 


GTTGGCATTA 


ACATGGACTC 


TGTTCAGATG 


120 


CAAGTGATGA 


ATGACTCTTT 


AAAGGGGGGT 


GTTCTTTACT 


GTTTGGATTA 


TTCCAAATGG 


180 


GATTCCACAC 


AAAACCCTGC 


AGTGACAGCA 


GCCTCCCTGG 


CAATATTGGA 


GAGATTTGCT 


240 


GAGCCCCATC 


CAATTGTGTC 


TTGTGCCATT 


GAGGCTCTTT 


CCTCCCCTGC 


AGAGGGCTAT 


300 


GTCAATGATA 


TCAAATTTGT 


GACACGCGGC 


GGCCTACCAT 


CTGGGATGCC 


ATTTACATCT 


360 


GTCGTCAATT 


CTATCAACCA 


TATGATATAC 


GTGGCGGChG 


CCATCCTGCA 


GGCATACGAA 


420 


AGCCACAATG 


TCCCATATAC 


TGGAAACGTC 


TTCCAAGTGG 


AGACCGTTCA 


CACGTATGGT 


480 


GATGATTGCA 


TGTACAGCGT 


GTGCCCTGCC 


ACTGCATCAA 


TTTTCCACAC 


TGTGCTTGCA 


540 


AACCTAACGT 


C 










551 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 183 amino acids 

(B) TYPE: amino acid 

(C) 5TRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Cys Asp Ala Ala Thr Thr Leu lie Ala Thr Ala Ala Phe Lys Ala Ala 
15 10 15 

Val Xaa Arg Leu Gin Val Val Thr Pro Met Thr Pro Val Ala Val Gly 
20 25 30 

lie Asn Met Asp Ser Val Gin Met Gin Val Met Asn Asp Ser Leu Lys 
35 40 45 
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Gly Gly Val Leu Tyr Cys Leu Asp Tyr Ser Lys Trp Asp Ser Thr Gin 
50 55 60 

Asn Pro Ala Val Thr Ala Ala Ser Leu Ala lie Leu Glu Arg Phe Ala 
65 70 75 80 

Glu Pro His Pro lie Val Ser Cys Ala He Glu Ala Leu Ser Ser Pro 
85 90 95 

Ala Glu Gly Tyr Val Asn Asp He Lys Phe Val Thr Arg Gly Gly Leu 
100 105 110 

Pro Ser Gly Met Pro Phe Thr Ser Val Val Asn Ser He Asn His Het 
115 120 125 

He Tyr Val Ala Ala Ala He Leu Gly Ala Tyr Glu Ser His Asn Val 
130 135 140 

Pro Tyr Thr Gly Asn Val Phe Gin Val Glu Thr Val His Thr Tyr Gly 
145 150 155 160 

Asp Asp Cys Met Tyr Ser Val Cys Pro Ala Thr Ala Ser He Phe His 
165 170 175 

Thr Val Leu Ala Asn Leu Thr 
180 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 148 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS r double 
(P) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
TGTGATGCTG CCACCACGCT TATAGCCACC GCGGCTTTTA AGGCCGTGGC TACAGGCTAC 60 
AGGTGGTGAC ACCAATGACA CCAGTTGCTG TTGGCATTAA CATGGACTCT GTTCAGATGC 120 
AAGTGATGAA TGACTCTTTA AAGGGGGG 14B 



(2) INFORMATION FOR SEQ ID NO: 8 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 449 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

ATGGACTCTG TTCAGATGCA AGTGATGAAT GACTCTTTAA AGGGGGGTGT TCTTTACTGT 60 

TTGGATTATT CCAAATGGGA TTCCACACAA AACCCTGCAG TGACAGCAGC CTCCCTGGCA 120 

&T&TTGG&GA GATTTGCTGA GCCCCATCCA ATTGTGTCTT GTGCCATTGA GGCTCTTTCC 180 

TCCCCTGCAG &GGGCTATGT CAATGATATC AAATTTGTGA CACGCGGCGG CCTACCATCT 240 
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GGGATGCCAT TTACATCTGT CGTCAATTCT ATCAACCATA TGATATACGT GGCGGCAGCC 300 

ATCCTGCAGG CATAOGAAAG CCACAATGTC CCATATACTG GAAACGTCTT CCAAGTGGAG 360 

ACCGTTCACA CGTATGGTGA TGATTGCATG TACAGCGTGT GCCCTGCCAC TGCATCAATT 420 

TTCCACACTG TGCTTGCAAA CCTAACGTC 449 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 446 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: human calicivirus Saporro (Day care) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

ATGGACTCTG TTCAGATGCA AGTGATGAAT GACTCTTTAA AGGGAGGTGT TCTCTACTGC 60 

CTGGATTACT CCAAATGGGA CTCCACACAA AATGCTGCAG TGACAGCAGC ATCCCTNNCA 120 

ATATTGGAGA GATTTGCTGA ACCCCACCCA ATTGTGTCTT GTGCCATTGA GGCCCTGNNC 180 

TCNNCTGCAG AGGGTTACGT TAATGATATC AAGTTTGTGA CACGTGGCGG CCTACCATGT 240 

GGGATGCCAT TCACATCTGT TGTCAATTCC ATCAACCACA TNATATACGT GGCAGCCGCC 300 

ATCCTGCAGG CATACGAAAG CCACAATGTT CCATACACTG GAAATGTCTT CCAAGTGGAG 360 

ACTGTTCACA CGTATGGTGA CGATTGCATG TACAGCGTGT GCCCTGCCAC CGCATCAATT ' 420 

TTCCACACTG TACTTGCAAA CCTAAC 446 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 434 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: human calicivirus Houston 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 3.-434 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

GGCCATGTTA TAGTGGTGTT CACATGAAAG ATGGCGACAA GATGTTGATA GATGCCAATC 60 

TTCCTTAC&A CCAGA&&TTA ACT&CTATGA TTCATGAGAC TAGGCATAGG ATAGGACAGT 120 

&T&T&G&T&A T&CTTTTGGA &&G&CATTTA GACATGGATT GACAAAACCT GCTGACAAGA 180 
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CTGTAGATTT GATCTATAAG ACATTGAATT ATGATGATTT TCTGGCAATA ATGCTAATCA 240 

TATATGGGCA AAAGTCGGCC ACTAATACGG AGTTGGAATT CTTGATGGAG AAACTTAGAG 300 

GTTATGAATC TACAATGGAT GACATAGGGA AAGTCTATGG AGATGATAAA ATGAGAGATA 360 

TAATCAAGAA TATTTCTGAT GATGACATAA AGAGTCTTTT AGGGGAGATA AATAGTGATT 420 

ATTCTGGTAA GNAT 434 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 144 amino acids 
<B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Pro Cye Tyr Ser Gly Val His Met Lye Asp Gly Asp Lys Met Leu lie 
15 10 15 

Asp Ala Asn Leu Pro Tyr Asn Gin Lys Leu Thr Thr Met He His Glu 
20 25 30 

Thr Arg His Arg He Gly Gin Tyr He Asp Asn Thr Phe Gly Lys Thr 
35 40 45 

Phe Arg His Gly Leu Thr Lys Pro Ala Asp Lys Thr Val Asp Leu He 
50 55 60 

Tyr Lys Thr Leu Asn Tyr Asp Asp Phe Leu Ala He Met Leu He He 
65 70 75 80 

Tyr Gly Gin Lys Ser Ala Thr Asn Thr Glu Leu Gin Phe Leu Met Glu 
85 90 95 

Lys Leu Arg Gly Tyr Glu Ser Thr Met Asp Asp lie Gly Lys Val Tyr 
100 105 110 

Gly Asp Asp Lys Met Arg Asp He He Lys Asn He Ser Asp Asp Asp 
115 120 125 

He Lys Ser Leu Leu Gly Glu He Asn Ser Asp Tyr Ser Gly Lys Xaa 
130 135 140 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2516 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV/KY/89 

(xi) SEQUENCE DESCRIPTION: SEQ ID HO: 12: 
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CAATAGAGGA 


TGG CCCTTT A 


ATTTATGCTG 


AACATGCCAA 


GTACAAAAAT 


CATTTTGATG 


60 


CAGATTACAC 


AGCATGGGAC 


TCTACACAAA 


ATAGACAAAT 


TATGACAGAA 


TGCTTCTCCA 


120 


TCATGTCACG 


CCTTACGGCC 


TCTCCAGAAC 


TAGCTGAGGT 


TGTAGCCCAG 


GACTTACTAG 


180 


CACCATCCGA 


GATGGATGTG 


GGCGACTATG 


TTATAAGGGT 


CAAAGAAGGC 


CTACCATCAG 


240 


GATTTCCCTG 


CACTTCTCAA 


GTGAATAGCA 


TAAATCACTG 


GATAATCACC 


CTTTGTGCAT 


300 


TGTCTGAGGC 


TACTGGCTTA 


TCACCTGATG 


TGGTACAGTC 


CATGTCATAC 


TTCTCATTCT 


360 


A0GGT6ATGA 


TGAGATCGTA 


TCAACTGACA TAGACTTTGA 


CCCAACTCGC 


CTCACCCAAA 


420 


TTCTCAAGGA 


ATACGGCCTC 


AAGCCAACAA 


GGCCAGACAA 


AACAGAAGGA 


CCAATACAGG 


480 


TGAGGAAGAA 


TGTGGATGGG 


CTAGTTTTTC 


TGCGGCGCAC 


CATCTCCCGG 


GACGCAGCAG 


540 


GGTTCCAAGG 


TAGACTGGAT 


AGAGCCTCAA 


TTGAACGTCA 


AATTTTCTGG 


hCCCGCGGGC 


600 




AGACCCATCA 


GAGACTCTGG 


TACCACACAC 


CCAAAGGAAA 


GTCCAGCTGA 


660 




nVTUnunnVj ^ 


TCACTCCACG 


GGGAAAAATT 


TTACAGGAAA ATATCTAGCA 








ACTGGTGGGC 


TGGAGATGTA 


TGTCCCAGGG 


TGGCAGGCCA 






UAlUUlvl Xw 


CATGACCTCG 


GATTGTGGAC 


AGGAGATCGC 


AATCTCCTGC 


840 


CCGAATTCGT 


AAATGATGAT 


GGOGTCTAAG 


GAOGCTAOGT 




iuuuvjUunu 1 


900 


GCGTCGGTTC 


AGTTGGTACC 


GGAGGTTAAT 


GCTTCTGACC 


CTCTTGCAAT 


GGATCCTGTG 


960 


GCGGGTTCTT 


CAACAGCAGT 


TGCAACCGCT 


GGACAAGTTA 


ACCCTATTGA 


CCCTTGGATA 


1020 


ATCAATAACT 


TTGTGCAGGC 


TCCCCAAGGT 


GAATTTACTA 


TTTCTCCAAA 


TAATACCCCC 


1080 


GGTGATGTTT 


TGTTTGATTT 


GAGTCTAGGC 


CCTCATCTTA 


ATCCCTTCTT 


GTTACATTTG 


1140 


TCACAAATGT 


ATAATGGCTG 


GGTTGGCAAC 


ATGAGAGTTA 


GGATTATGCT 


GGCTGGTAAT 


1200 


GGATTTACTG 


CAGGCAAAAT 


TATAGTTTCT 


TGCATACCTC 


CTGGCTTTGG 


CTCCCAACAA 


1260 


CTTACTATAG 


CACAAGCAAC 


TCTCTTCCCG 


CATGTGATTG 


CTGATGTTAG 


GACTTTAGAC 


1320 


CCAATTGAAG 


TACCCTTGGA 


AGATGTAAGG 


AATGTTCTCT 


TTCATAATAA 


TGATAGAAAT 


1380 


CAACAAACTA 


TGCGCCTTGT 


GTGCATGCTT 


TATACCCCCC 


TCAGCACTGG 


TGGCGGTACA 


1440 


GGTGATTCTT 


TTGTGGTTGC 


AGGGCGAGTC 


ATGACTTGTC 


CTAGCCCCGA 


CTTTAATTTC 


1500 


TTGTTCTTGG 


TTCCTCCCAC 


AGTGGAACAG 


AAGACTAGGC 


CTTTCACCCT 


CCCAAATTTA 


1560 


CCGCTGAGTT 


CTTTGTCTAA 


TTCACGTGCT 


CCTCTTCCAA 


TTAGTGGCAT 


GGGTATTTCT 


1620 


CCAGATAATG 


TTCAGAGTGT 


GCAGTTCCAA 


AATGGCCGAT 


GTACCTTAGA 


TGGACGTCTT 


1680 


GTTGGCACCA 


CCCCAGTTTC 


CCTCTCCCAT 


GTTGCTAAGA 


TAAGGGGTAC 


TTCTAATGGT 


1740 


ACAGTAATCA 


ATCTCACCGA 


ATTGGATGGC ACCCCCTTCC 


ACCCTTTTGA 


AGGCCCTGCC 


1800 


CCTATTGGTT 


TTCCAGATCT 


TGGTGGCTGT 


GATTGGCATA 


TTAATATGAC 


ACAATTTGGA 


1860 


ChTTCChGTC 


&G&OTC&GTA 


TGUTGT&GAC 


ACCACCCCCG 


ACACCTCCGT 


CCCTCACTTA 


1920 


GGTTC&ATCC 


&GGCG&&TGG 


CATTGGT&GT 


GGCAACTATA 


TTGGTGTTCT 


T&GCTGGGTC 


1980 
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TCCCCCCGAT 


CACATCCATC 


TGGCTCTCAA GTTGATCTCT 


GGAAGATCCC 


CAACTATGGG 


2040 


TCTAGTATCA 


CAGAGGCAAC 


CCATCTAGCT CCCTCTGTCT 


ATTCTCCTGG 


CTTTGGAGAG 


2100 


GTGCTAGTCT 


TTTTCATGTC 


AAAGATACCA GGTCCTGGTG 


GTGATAGTCT 


GCCCTGTTTA 




CTGCCACAAG 


GATATATCTC 


ACACCTTGCA AGTGAACAAG 


CCCCAACTGT 


TGGTGAGGGT 


2220 


CCCCTGCTCC 


ACTATGTTGA 


CCCTGACAOG GACCGGAATC 


TTGGGGAGTT 


TAAGGCTTAC 


2280 


CCTGATGGTT 


TCCTAACCTG 


TGTCCCTAAT GGGGCCAGCT 


CGGGCCCACA ACAACTACCA 


2340 


ATCAATGGAG 


TCTTTGTCTT 


TGTTTCATGG GTGTCCAGAT 


TTTATCAGTT 


AAAGCCTGTG 


2400 


GGAACTGCCA 


GTAOGGCAAG 


AGGTAGGCTT GGTTTGCGCC 


GATAATGGCT 


CAGGCTATAA 


2460 


TTGGTGCAAT 


TGCCGCCTCT 


ACAGCAGGTA GTGCTTTAGG 


GGCAGGTATA 


CAGGTT 


2516 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 124 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: primate calcicvirus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

TGGACGGACC TGCTGTTGAA GATCTCTTCA AAGGCTCGAA CGACCAAAGC ACGATCGGTA 60 

TTGTGTTGAC TACGCAAAGT GGGACTCAAC CCACCACCAA AAGTAACATC CAATCAATGA 120 

CATC 124 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 110 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: primate calcicvirus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

GTGAATGACA TCTTCGACTC GATGGACCTA TTCACATATG GTGATGACGG TGTCTACATC 60 

GTCCCACCAC TATATCATCT GTCSTGCCCA AGTCTTCACC AACCTGAAAC 110 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CTTGTTGGTT TGAGGCCATA T 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: 
ATAAAAGTTG GCATGAACA 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
GTTGACACAA TCTCATCATC 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
GGCCTGCCAT CTGGATTGCC 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GGGCCCCCTG GTATAGGTAA 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 baBe pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 
TGGTGATGAC TATAGCATCA GACACAAA 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
ACTCACCCAA ATCCTCCA 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 18 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
GTTCTGACCA CCTAACCT 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 
AGTTTGGGTC CCCATCTTAA TCCTTT 
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(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 
TGAACCAAAA CCAGGGGG 



(2) INFORMATION FOR SEQ ID NO:25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRA NDEDNE SS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
AGCAAAGTCA TACATGAAAT 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
CCATTATACA TTTGTAG 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 
ATTATAGTTT CTTGCATA 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 20 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDED NESS : double 

(D) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
CACACTCTGG ACATTGTCTG 20 

(2) INFORMATION FOR SEQ ID NO:29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 baae pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: un)cnown 

(ii) MOLECULE TYPE: DNA (genomic) 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 
CATTGGGTTT CCAGACCTA 19 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
ATAATTGGGG ATCTTCCAAA 20 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
TAGTGGCATG GGTATTTC 18 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DE?A (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
TATGCCAATC ACAGCCAC 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA < genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
GTCTGGCTCC CAAGTTGACC 



(2) INFORMATION FOR SEQ ID KG: 24: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
CGGTATCAGG GTCAACAT 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
TGAGGCTGCC CTGCTCCA 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

CG&CCGCTGT CCGGGAGG 
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(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
GTTGCTGTTG GCATTAACA 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 126 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE 

(A) ORGANISM: Norwalk virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

His Phe Asp Ala Asp Tyr Thr Ala Trp Asp Ser Thr Gin Asn Arg Gin 
1 5 10 15 

He Met Thr Glu Ser Phe Ser He Met Ser Arg Leu Thr Ala Ser Pro 
20 25 30 

Glu Leu Ala Glu Val Val Ala Gin Asp Leu Leu Ala Pro Ser Glu Met 
35 40 45 

Asp Val Gly Asp Tyr Val He Arg Val Lys Glu Gly Pro Ser Gly Phe 
50 55 60 

Pro Cys Thr Ser Gin Val Asn Ser He Asn His Trp He He Thr Leu 
65 70 75 80 

Cys Ala Leu Ser Glu Ala Thr Gly Leu Ser Pro Asp Val Val Gin Ser 
85 90 95 

Met Ser Tyr Phe Ser Phe Tyr Gly Asp Asp Glu He Val Ser Thr Asp 
100 105 HO 

He Asp Phe Asp Pro Ala Arg Leu Thr Gin He Leu Lys Glu 
115 120 125 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 121 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: hepatitis E virus 



19 



WO 94/05700 



PCT/US93/08447 



83 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

Val Phe Glu Asn Asp Phe Ser Glu Phe Asp Ser Thr Gin Asn Asn Phe 
15 10 15 

Ser Leu Gly Leu Glu Cys Ala lie Met Glu Glu Cys Gly Met Pro Gin 
20 25 30 

Trp Leu lie Arg Leu Tyr His Leu lie Arg Ser Ala Trp lie Leu Gin 
35 40 45 

Ala Pro Lye Glu Ser Leu Arg Gly Phe Trp Lys Lys His Ser Lys His 
50 55 60 

Ser Gly Glu Pro Gly Thr Leu Leu Trp Asn Thr Val Trp Asn Met Ala 
65 70 75 80 

Val lie Thr His Cys Tyr Asp Phe Arg Asp Phe Gin Val Ala Ala Phe 
85 90 95 

Lya Gly Asd Asp Ser lie Val Leu Cys Ser Glu Tyr Arg Gin Ser Pro 
100 105 110 

Gly Ala Ala Val Leu He Ala Gly Cys 
115 120 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 127 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: hepatitis C virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val Thr Glu Ser 
15 10 15 

Asp He Arg Thr Glu Glu Ala He Tyr Gin Cys Cys Asp Leu Asp Pro 
20 25 30 

Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu Arg Leu Tyr Val Gly 
35 40 45 

Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr Arg Arg Cys 
50 55 60 

Arg Ala Ser Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
65 70 75 80 

Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
85 90 95 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val He Cys 
100 105 110 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe 
115 120 125 
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(2) INFORMATION FOR SEQ ID NO:41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 132 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: hepatitis A virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

Gly Leu Asp Leu Asp Phe Ser Ala Phe Asp Ala Ser Leu Ser Pro Phe 
15 10 15 

Met lie Arg Glu Ala Gly Arg He Met Ser Glu Leu Ser Gly Thr Pro 
20 25 30 

Ser His Phe Gly Thr Ala Leu lie Asn Thr He He Tyr Ser Lys His 
35 40 45 

Leu Leu Tyr Asn Cys Cys Tyr His Val Cys Gly Ser Met Pro Ser Gly 
50 55 60 

Ser Pro Cys Thr Ala Leu Leu Asn Ser He He Asn Asn Val Asn Leu 
65 70 75 80 

Tyr Tyr Val Phe Ser Lye He Phe Gly LyB Ser Pro Val Phe Phe Cys 
85 90 95 

Gin Ala Leu Lys He Leu Cys Tyr Gly Asp Asp Val Leu He Val Phe 
100 105 110 

Ser Arg Asp Val Gin He Asp Asn Leu Asp Leu lie Gly Gin Lys He 
115 120 125 

Val Asp Glu Phe 
130 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 158 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Japanese encephalitis virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

Met Tyr Ala Asp Asp Thr Ala Gly Trp Asp Thr Arg He Thr Arg Thr 
15 10 15 

Asp Leu Glu Asn Glu Ala Lys Val Leu Glu Leu Leu Asp Gly Glu His 
20 25 30 

tog net Leu Ma Arg Ala He lie Glu Leu Thr Tyr Arg His Lys Val 
35 40 45 
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Val Lys Val Met Arg Pro Ala Ala Glu Gly Lys Thr Val Met Asp Val 
50 55 60 

He Ser Arg Glu Asp Gin Arg Gly Ser Gly Gin Val Val Thr Tyr Ala 
65 70 75 80 

Leu Asn Thr Phe Thr ABn He Ala Val Gin Leu Val Arg Leu Met Glu 
85 90 95 

Ala Glu Gly Val He Gly Pro Gin His Leu Glu Gin Leu Pro Arg Lys 
100 105 HO 

Thr Lys He Ala Val Arg Thr Trp Leu Phe Glu Asn Gly Glu Glu Arg 
115 120 125 

Val Thr Arg Met Ala He Ser Gly Asp Asp Cys Val Val Lys Pro Leu 
130 135 140 

Asp Asp Arg Phe Ala Thr Ala Leu His Phe Leu Asn Ala Met 
145 150 155 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 117 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

<A) ORGANISM: Poliovirus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 

Phe Ala Phe Asp Tyr Thr Gly Tyr Asp Ala Ser Leu Ser Pro Ala Trp 
15 10 15 

Phe Glu Ala Leu Lys Met Val Leu Glu Lys He Gly Phe Gly Asp Arg 
20 25 30 

Val Asp Tyr He Asp Tyr Leu Asn His Ser His His Leu Tyr Lys Asn 
35 40 45 

Lys Thr Tyr Cys Val Lys Gly Gly Met Pro Ser Gly Cys Ser Gly Thr 
50 55 60 

Ser He Phe Asn Ser Met He Asn Asn Leu He He Arg Thr Leu Leu 
65 70 75 80 

Leu Lys Thr Tyr Lys Gly He Asp Leu Asp His Leu Lys Met He Ala 
85 90 95 

Tyr Gly Asp Asp Val He Ala Ser Tyr Pro His Glu Val Asp Ala Ser 
■ 100 105 110 

Leu Leu Ala Gin Ser 
115 

{2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 121 amino acids 
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(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Foot-and-mouth disease virus 

(Xi) SEQUENCE DESCRIPTION : SEQ ID NO: 44: 

Val Trp Asp Val Asp Tyr ser Ala Phe Asp Ala Asn His CyB ser Asp 
15 10 15 

Ala Met Asn He Met Phe Glu Glu Val Phe Arg Thr Asp Phe Gly Phe 
20 25 30 

His Pro Asn Ala Glu Trp He Leu Lys Thr Leu Val Asn Thr Glu His 
35 40 45 

Ala Tyr Glu Asn Lys Arg He Thr Val Glu Gly Gly Met Pro Ser Gly 
50 . 55 60 

Cys Ser Ala Thr Ser He lie Asn Thr He Leu Asn Asn He Tyr Val 
65 70 75 80 

Leu Tyr Ala Leu Arg Arg His Tyr Glu Gly Val Glu Leu Asp Thr Tyr 
85 90 95 

Thr Met He Ser Tyr Gly Asp Asp He Val Val Ala Ser Asp Tyr Asp 
100 105 110 

Leu Asp Phe Glu Ala Leu Lys Pro His 
115 120 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 126 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: encephalomyocarditis virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

Val Tyr Asp Val Asp Tyr Ser Asn Phe Asp Ser Thr His Ser Val Ala 
15 10 15 

Met Phe Arg Leu Leu Ala Glu Glu Phe Phe Thr Pro Glu Asn Gly Phe 
20 25 30 

Asp Pro Leu Thr Arg Glu Tyr Leu Glu Ser Leu Ala He Ser Thr His 
35 40 45 

Ala Phe Glu Glu Lys Arg Phe Leu lie Thr Gly Gly Leu Pro Ser Gly 
50 55 60 

CyB Ala Ala Thr Ser Met Leu Asn Thr He Met Asn Asn He He He 
65 70 75 80 
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Arg Ala Gly Leu Tyr Leu Thr Tyr Lys Asn Phe Glu Phe Asp Asp Val 
85 90 95 

Lys Val Leu Ser Tyr Gly Asp Asp Leu Leu Val Ala Thr Asn Tyr Gin 
100 105 110 

Leu Asp Phe Asp LyB Val Arg Ala Ser Leu Ala Lys Thr Gly 
115 120 125 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 122 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Sindbis virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

Val Leu Glu Thr Asp lie Ala Ser Phe Asp Lys Ser Gin Asp Asp Ala 
15 10 15 

Met Ala Leu Thr Gly Leu Met lie Leu Glu Asp Leu Gly Val Asp Gin 
20 25 30 

Pro Leu Leu Asp Leu lie Glu Cys Ala Phe Gly Glu lie Ser Ser Thr 
35 40 45 

His Leu Pro Thr Gly Thr Arg Phe Lys Phe Gly Ala Met Met Lys Ser 
50 55 60 

Gly Met Phe Leu Thr Leu Phe Val Asn Thr Val Leu Asn Val Val lie 
65 70 75 80 

Ala Ser Arg Val Leu Glu Glu Arg Leu Lys Thr Ser Arg Cys Ala Ala 
65 90 95 

Phe lie Gly Asp Asp Asn lie lie His Gly Val Val Ser Asp Lys Glu 
100 105 110 

Met Ala Glu Arg Cys Ala Thr Trp Leu Asn 
115 120 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 124 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: tobacco mosaic virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 

Val Leu Glu Leu Asp lie Ser Lys Tyr Asp Lys Ser Gin Asn Glu Phe 
15 10 15 
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His Cye Ala Val Glu Tyr Glu lie Trp Arg Arg Leu Gly Phe Glu Asp 
20 25 30 

Phe Leu Gly Glu Val Trp Lys Gin Gly His Arg Lys Thr Thr Leu Lys 
35 40 45 

Asp lie Thr Ala Gly Tyr Lys Thr Cys lie Trp Tyr Gin Arg Lys Ser 
50 55 60 

Gly Asp Val Thr Thr Phe lie Gly Asn Thr Val lie lie Ala Ala Cys 
65 70 75 80 

Leu Ala Ser Met Leu Pro Met Glu Lys He He Lys Gly Ala Phe Cys 
85 90 95 

Gly Asp Asp Ser Leu Leu Tyr Phe Pro Lys Gly Cys Glu Phe Pro Asp 
100 105 HO 

Val Gin His Ser Ala Asn Leu Met Trp Asn Phe Glu 
115 120 

(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 125 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: alfalfa mosaic virus 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 

Phe Lys Glu lie Asp Phe Ser Lys Phe Asp Lys Ser Gin Asn Glu Leu 
15 10 15 

His His Leu He Gin Glu Arg Phe Leu Lys Tyr Leu Gly He Pro Asn 
20 25 30 

Glu Phe Leu Thr Leu Trp Phe Asn Ala His Arg Lys Ser Arg He Ser 
35 40 45 

Asp Ser Lys Asn Gly Val Phe Phe Asn Val Asp Phe Gin Arg Arg Thr 
50 55 60 

Gly Asp Ala Leu Thr Tyr Leu Gly Asn Thr He Val Thr Leu Ala Cys 
65 .70 75 80 

Leu Cys His Val Tyr Asp Leu Met Asp Pro Asn Val Lys Phe Val Val 
85 90 95 

Ala Ser Gly Asp Asp Ser Leu He Gly Thr Val Glu Glu Leu Pro Arg 
100 105 HO 

Asp Gin Glu Phe Leu Phe Thr Thr Leu Phe Asn Leu Glu 
115 120 125 



(2) INFORMATION FOR SEQ ID NO: 49: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 122 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

( ii ) MOLECULE TYPE : peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: brome mosaic virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

Phe Leu Glu Ala Asp Leu Ser Lys Phe Asp Lys Ser Gin Gly Glu Leu 
15 10 15 

His Leu Glu Phe Gin Arg Glu He Leu Leu Ala Leu Gly Phe Pro Ala 
20 25 30 



Pro Leu Thr Asn Trp Trp Ser Asp Phe His Arg Asp Ser Tyr Leu Ser 
35 40 45 

Asp Pro His Ala Lys Val Gly Met Ser Val Ser Phe Gin Arg Arg Thr 
50 55 60 

Gly Asp Ala Phe Thr Tyr Phe Gly Asn Thr Leu Val Thr Met Ala Met 
65 70 75 80 

He Ala Tyr Ala Ser Asp Leu Ser Asp Cys Asp Cys Ala He Phe Ser 
85 90 95 

Gly Asp Asp Ser Leu He He Ser Lys Val Lys Pro Val Leu Asp Thr 
100 105 110 

Asp Met Phe Thr Ser Leu Phe Asn Met Glu 
115 120 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 142 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: cowpea mosaic virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 

Val Leu Cys Cys Asp Tyr Ser Ser Phe Asp Gly Leu Leu Ser Lys Gin 
15 10 15 

Val Met Asp Val He Ala Ser Met He Asn Glu Leu Cys Gly Gly Glu 
20 25 30 

Asp Gin Leu Lys Asn Ala Arg Arg Asn Leu Leu Met Ala Cys Cys Ser 
35 40 45 

Arg Leu Ala He Cys Lys Asn Thr Val Trp Arg Val Glu Cys Gly He 
50 55 60 
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Pro Ser 
65 



Gly Phe Pro Met Thr Val He Val Asn Ser He Phe Asn Glu 
70 75 80 



He Leu 



He Arg Tyr His Tyr Lys Lys Leu Met Arg Glu Gin Gin Ala 
85 90 95 



Pro Glu 



Leu Met Val Gin Ser Phe Asp Lys Leu He Gly Leu Val Thr 
100 105 110 



Tyr Gly 



Asp Asp Asn Leu He Ser Val ABn Ala Val Val Thr Pro Tyr 
115 120 125 



Phe Asp 
130 



Gly Lys Lys Leu Lys Gin Ser Leu Ala Gin Gly Gly 
135 140 



(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 18 base pairs 
<B) TYPE: nu cleic acid 

(C) STRAND ED NESS : double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
CACGCGGAGG CTCTCAAT 18 



(2) INFORMATION FOR SEQ ID NO: 52 : 

( i ) SEQUENCE.. CHARACTERISTICS : 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
GGTGGCGAAG CGGCCCTC 18 



(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
TCAGCAGTTA TAGATATG 18 



(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 18 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(Xi) SEQUENCE DESCRIPTION : SEQ ID NO: 54: 
ATGCTATATA CATAGGTC 



(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
CAACAGGTAC TACGTGAC 



(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
TGTGGCCCAA GATTTGCT 



(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
ATAAAAGTTG GCATGAACAC AAAT 



(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
GTTGCTGTTG GCATTAACAT GGAC 



(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
GTTCCTGTTG GCATAAACAT GGAC 



(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
GTTCCGGTTG GCATTAACAT GGAC 



(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 
GTTCCGGTTG GTATCAACAT GGAC 



(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 
GTTGCGGTTG GTGTTG&CAT GAGA 
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(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1X8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV/CDC 6/91 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 

ATGCACTTCA CAGGTGAATA GCATCAACCA CTGGATCCTA ACTCTATG TG CATTGTCAGA 60 

AGTCACTGGC TTGTCCCCTG ATGTGATACA ATCACAATCT TATTTCTCAT TTTATGGT 118 



(2) INFORMATION FOR SBQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 118 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV/UT/88 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 

ATGTACCTCA CAAGTGAACA GCATCAATCA CTGGATTTTG ACCTTGTGGG GCCTATCAGA 60 

AGTTACTGGT CTGGCTCCTG ATGTAATACA GTCACAATCT TACTTTTCAT TCTATGGT 118 



(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 117 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Snow Mountain Agent/78 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:65: 

CTGCACATCA CAGTGGAATT CCATGCCCAC TGGCTCCTCA CACTCTGTGC ACTATCTGAA 60 

GTCACAAACC TGGCTCCTGA CATCATACAA GCTAACTCCT TGTTCTCTTT CTATGGT 117 



(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 118 base pairs 
(3) TYPE: nucleic acid 



WO 94/05700 



PCT/US93/08447 



94 



(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV/CAMBRIDGE , UK 92 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 

CTGCACCTCA CAGTGGAACT CCATTGCCCA CTGGTTGCTT ACTCTGTGTG CCCTTTCTGA 

AGTGACAGGA CTAGGCCCCG ACATCATACA AGCTAATTCC ATGTACTCTT TCTATGGT 



(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 118 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV/CDC 32 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 

TTGCACCTCA CAGTGGAACT CCATTGCCCT CTGGTTGCTT ACTCTGTGTG CCCTTTCTGA 

AGTGACAGGA CTAGGCCCCG ACATCATACA AGCTAATTCC ATGTACTCTT TCTATGGT 



(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 118 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Norwalk virus/8FIIa/68 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 

ATGTACTTCC CAGGTGAACA GCATAAATCA CTGGATAATT ACTCTCTGTG CACTGTCTGA 60 

GGCCACTGGT TTATCACCTG ATGTGGTGCA ATCCATGTCA TATTTCTCAT TTTATGGT 118 



(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 118 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 



60 
118 



60 
118 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV-3/88 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 

CTGCACTTCT CAAGTAAATA GCATAAATCA CTGGATAATC ACCCTTTGTG CACTGTCTGA 60 

GGCTACTGGC TTATCACCTG ATGTGGTGCA GTCCATGTCA TACTTCTCAT TTTACGGT 118 



(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 118 base pairs 

(B) TOPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV/KY89/89 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 

CTGCACTTCT CAAGTGAATS GCATAAATCA CTGGATAATC ACCCTTTGTG CATTGTCTGA 60 

GGCTACTGGC TTATCACCTG ATGTGGTACA GTCCATGTCA TACTTCTCAT TCTACGGT 118 



(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 279 base pairs 

(B) TOPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TOPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Norwalk Virus /8FI I a/ 68 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 

CAATAGAAGA TGGCCCCCTC ATCTATGCTG AGCATGCTAA ATATAAGAAT CATTTTGATG 60 

CAGATTATAC AGCATGGGAC TCAACACAAA ATAGACAAAT TATGACAGAA TCCTTCTCCA 120 

TTATGTCGCG CCTTACGGCC TCACCAGAAT TGGCCGAGGT TGTGGCCCAA GATTTGCTAG 180 

CACCATCTGA GATGGATGTA GGTGATTATG TCATCAGGGT CAAAGAGGGG CTGCCATCTG 240 

GATTCCCATG TACTTCCCAG GTGAACAGCA TAAATCACT 279 



(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE" CHARACTERISTICS: 

(A) LENGTH: 279 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 



(ii) HGLECULE TOPS: DE3A (genomic) 
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<vi) ORIGINAL SOURCE; 

(A) ORGANISM: SRSV-3/88 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 



CAATAGAGGA TGGCCCTTTA ATTTATGCTG AGCATGCCAA GTACAAAAAT CATTTTGATG 



60 



CAGATTACAC AGCATGGGAC TCTACACAAA ATAGACAAAT AATGACAGAA TCCTTTTCCA 



120 



TCATGTCACG CCTCACGGCC TCTCCAGAAC TAGCTGAGGT TGTAGCCCAG GACTTGCTAG 



180 



CACCATCCGA GATGGATGTG GGTGACTATG TTATAAGGGT CAAAGAAGGC CTACCATCAG 



240 



GATTTCCCTG CACTTCTCAA GTAAATAGCA TAAATCACT 



279 



(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 279 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA {genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV/KY89 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 

CAATAGAGGA TGGCCCTTTA ATTTATGCTG AACATGCCAA GTACAAAAAT CATTTTGATG 60 

CAGATTACAC AGCATGGGAC TCTACACAAA ATAGACAAAT TATGACAGAA TCCTTCTCCA 120 

TCATGTCACG CCTTACGGCC TCTCCAGAAC TAGCTGAGGT TGTAGCCCAG GACTTACTAG 180 

CACCATCCGA GATGGATGTG GGCGACTATG TTATAAGGGT CAAAGAAGGC CTACCATCAG 240 

GATTTCCCTG CACTTCTCAA GTGAATAGCA TAAATCACT 279 



(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 279 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV/ Cambridge, UK/92 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:74: 

TGTATGAAGA TGGTACCATA ATATTTGAGA AACATTCCAG ATACAGATAC CACTATGATG 60 

CAGATTATCC CGCTGGGTAC TCCACGCAGC AACGGGCAGT GTTGGCAGCA GCACTTGAAA 120 

TCATGGTGAG GTTCTCTGCT GAACCACAGC TAGCGCAAAT AGTAGCTGAA GATCTGCTAG 180 

C&CC&AGTGT &GTTGATGTG GGTGACTTCA &GATCACCAT TA&TG&&GGC CTACCTTCTG 240 
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GTGTGCCCTG CACCTCACAG TGGAACTCCA TTGCCCACT 279 



(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 277 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: UNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Snow Mountain Agent/78 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 

GAATGAGGAT GGACCCATAA TTTTTGAAAA GCACTCCAGG TTCTCATACC ACTATGATGC 60 

AGATTACTCA CGCTGGGACT CAACCCAACA GAGGGCAGTG CTAGCTGCAG CCTTGGAAAT 120 

CATGGTAAAA TTCTCACCAG AACCACATTT GGCCCAAATT GTTGCAGAGG ATCTCCTAGC 180 

CCCCAGTGTG ATGGATGTAG GTGATTTCAA AATAACAATT AATGAGGGAC TGCCCTCGGG 240 

AGTACCCTGC ACATCACAGT GGAATTCCAT GCCCACT 277 
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CLAIMS 

1. A cDNA sequence of the formula shown in Table 2 and 
fragments and derivatives thereof having sufficient size to bind a Norwalk 
or Norwalk-related virus genome. 

5 2. A protein encoded by nucleotides including nucleotides 1 

through 7753 of the Norwalk virus genome shown in Table 2 or fragments 
or derivatives thereof. 

S. The protein of claim 2, wherein said protein is produced in 
a prokaryotic expression system or a eukaryotic expression system. 

10 4. The protein of claim 2, wherein said protein is produced by 

chemical methods. 

5. A protein encoded by nucleotides 146 through 5359 of the 
Norwalk virus genome shown in Table 2 or fragments or derivatives 
thereof. 

15 6. The protein of claim 5, wherein said protein is produced in 

a prokaryotic expression system or eukaryotic expression system. 

7. The protein of claim 5, wherein said protein is produced by 
chemical methods. 

8. A RNA-dependent RNA polymerase encoded by nucleotides 
20 4543 to 4924 of the Norwalk virus genome shown in Table 2 or fragments. 

9. The RNA polymerase of claim 8, wherein said RNA 
polymerase is produced in a prokaryotic expression system or a eukaryotic 
expression system. 
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10, The RNA polymerase of claim 8, wherein said RNA 
polymerase is produced by chemical methods. 

11. A protein encoded by nucleotides 5337 through 7573 of the 
Norwalk virus genome shown in Table 2 or fragments or derivatives 

5 thereof. 

12. The protein of claim 11, wherein said protein is produced in 
a prokaryotic expression system or eukaryotic expression system. 

13, The protein of claim 11, wherein said protein is produced by 
chemical methods. 

10 14. A protein encoded by nucleotides 5346 through 6935 of the 

Norwalk virus genome shown in Table 2 or fragments or derivatives 
thereof. 

15. The protein of claim 14, wherein said protein is produced in 
a prokaryotic expression system or eukaryotic expression system. 

15 16. The protein of claim 14, wherein said protein is produced by 

chemical methods. 

17. A protein encoded by nucleotides 6938 through 7573 of the 
Norwalk virus genome shown in Table 2 or fragments or derivatives 
thereof. . 

20 18. The protein of claim 17, wherein said protein is produced in 

a prokaryotic expression system or eukaryotic expression system. 

19. The protein of claim 17, wherein said protein is produced by 
chemical methods. 
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20. A method of making a RNA probe to detect Norwalk or 
Norwalk-related viruses, comprising the steps of: 

subcloning a Norwalk virus cDNA clone into a transcription 
vector; 

5 growing said cDNA containing transcription vector; 

adding RNA polymerase to generate single stranded RNA by 
in vitro transcription; and 

isolating said single stranded RNA. 

21. A method of identifying Norwalk or Norwalk-related viruses 
1G in a sample suspected of containing Norwalk or Norwalk-related viruses, 

comprising the steps of: 

adding a cDNA or a RNA probe specific to Norwalk virus or 
a Norwalk-related virus to said sample to be tested under 
conditions in which the cDNA or RNA probe will bind to the 
15 Norwalk or Norwalk-related virus genome; and 

measuring the amount of binding of said cDNA or RNA 

probe. 

22. The method of claim 21 t wherein said sample is selected from 
the group consisting of food, water and stool. 

20 23. The method of claim 21, wherein said cDNA is selected from 

a group consisting of pUCNV-953, pUCNV-4145, pUCNV-4095, pUCNV- 
5030 and pUCNV-5101 or fragments or derivatives thereof. 

24. A method of identifying Norwalk or Norwalk-related viruses 
in a sample suspected of containing Norwalk or Norwalk-related viruses 
25 comprising the steps of: 

adding at least two oligonucleotides each of about 10 
nucleotides or greater to said sample under conditions in which 
said oligonucleotides bind to the Norwalk or Norwalk-related virus 
genome; 
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amplifying a nucleotide sequence between said bound 
oligonucleotides; and 

measuring the amount of amplified sequence. 

25. A method of identifying Norwalk or Norwalk-related viruses 
5 in a sample suspected of containing Norwalk or Norwalk-related viruses 
comprising the steps of: 

isolating said nucleic acids using CTAB procedure; 
amplifying nucleic acid; and 
measuring the amplified product. 

10 26. The method of claim 25, wherein the CTAB procedure 

includes: 

extracting said sample with genetron; 
removing the supernatant of said genetron extracted 
sampled; 

precipitating viruses in said supernatant with polyethylene 

glycol; 

treating said precipitate with proteinase K in the presence 
of SDS at about 30° minutes; 

sequentially extracting said treated precipitate with phenol- 
chloroform and then chloroform; 

forming a mixture by adding a solution of about 5% CTAB 
and about 0.4M NaCl to said supernatant of said sequentially 
extracted sample at a ratio of about 5:2 samplerCTAB; 
incubating said mixture; 

centrifuging said mixture to collect nucleic acids; 
suspending said nucleic acids in 1M NaCL and thereafter 
extracting with chloroform. 

27. A method of claim 25 further comprising: 

performing reverse transcription on said nucleic acids; 
amplifying nucleic acids using primers; and 
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detecting the amplified nucleic acids using agarose gel 
electrophoresis. 

28. A method of cloning Norwalk or pathogens from food, 
biological and environmental samples, comprising: 

5 isolating said nucleic acids using CTAB procedure; 

amplifying nucleic acids; and 

incorporating said amplified nucleic acids into vectors. 

29. A primer sequence of the formula CTT GTT GGT TTG AGG 
CCA TAT. 

10 30. A primer sequence of the formula ATA AAA GTT GGC ATG 
AAC A. 

31. A primer sequence of the formula GTT GAC ACA ATC TCA 
TCA TC. 

32. A primer sequence of the formula GGC CTG CCA TCT GGA 
15 TTG CC. 

33. A primer sequence of the formula GGG CCC CCT GGT ATA 
GGT AA. 

34. A primer sequence of the formula TGG TGA TGA CTA TAG 
CAT CAG ACA CAA A. 

20 35. A primer sequence of the formula ACT CAC CCA AAT CCT 

CCA. 



36. A primer sequence of the formula GTT CTG ACC ACC TAA 

CCT. 
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37. A prim r sequence of the formula AGT TTG GGT CCC CAT 
CTT AAT CCT TT. 

38. A primer sequence of the formula TGA ACC AAA ACC AGG 

GGG. 

5 39. A primer sequence of the formula AGC AAA GTC ATA CAT 

GAAAT. 

40. A primer sequence of the formula CCA TTA TAC ATT TGT 

AG. 

41. A primer sequence of the formula ATT ATA GTT TCT TGC 

10 ATA. 

42. A primer sequence of the formula CAC ACT CTG GAC ATT 
GTC TG. 

43. A primer sequence of the formula CAT TGG GTT TCC AGA 
CCT A. 

15 44. A primer sequence of the formula ATA ATT GGG GAT CTT 

CCA AA. 

45. A primer sequence of the formula TAG TGG CAT GGG TAT 

TTC. 

46. A primer sequence of the formula TAT GCC AAT CAC AGC 

20 CAC. 

47. A primer sequence of the formula GTC TGG CTC CCA AGT 
TGA CC. 
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48. A primer sequence of the formula CGG TAT GAG GGT CAA 

CAT. 

49. A primer sequence of the formula TGA GGC TGC CCT GCT 

CCA. 

5 50. A primer sequence of the formula CCA CCG CTG TCC GGG 
AGG. 

51. A primer sequence of the formula GTT GCT GTT GGC ATT 
AAC A. 



52. A method of making a probe to detect Norwalk or Norwalk- 
10 related viruses, comprising the steps of: 

synthesizing one or more short or long nucleotides from the 
Norwalk virus genome shown in Table 2 or fragments or 
derivatives thereof. 



53. The probe produced by the method of claim 52. 



15 54. A method of making a probe to detect Norwalk or Norwalk- 

related viruses, comprising the step of: 

synthesizing one or more short or long nucleotides from a 
subgenomic region of the Norwalk virus genome shown in Table 2 
or fragments or derivatives thereof. 

20 55. The probe produced by the method of claim 54. 

56. The probe of claim 55, wherein said subgenomic region 
includes a sequence of the formula CTT GTT GGT TTG AGG CCA TAT. 
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57. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula ATA AAA GTT GGC ATG 
AAC A. 

58. The probe of claim 55, wherein said subgenomic region 
5 includes a nucleotide sequence of the formula GTT GAC ACA ATC TCA 

TCA TC. 

59. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula GGC CTG CCA TCT GGA 

TTG CC. 

10 60. The probe of claim 55, wherein said subgenomic region 

includes a nucleotide sequence of the formula GGG CCC CCT GGT ATA 
GGTAA. 

61. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula TGG TGA TGA CTA TAG 

15 CATCAG ACACAAA. 

62. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula GTT CTG ACC ACC TAA 
CCT. 

63. The probe of claim 55, wherein said subgenomic region 
20 includes a nucleotide sequence of the formula AGT TTG GGT CCC CAT 

CTT AAT CCT TT. 



64. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula TGA ACC AAA ACC AGG 
GGG. 



WO 94/05700 



106 



PCT/US93/08447 



65. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula AGC AAA GTC ATA CAT 
GAAAT. 

66. The probe of claim 55, wherein said subgenomic region 
5 includes a nucleotide sequence of the formula CCA TTA TAC ATT TGT 

AG. 

67. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula CAC ACT CTG GAC ATT 
GTC TG. 

10 68. The probe of claim 55, wherein said subgenomic region 

includes a nucleotide sequence of the formula CAT TGG GTT TCC AGA 
CCT A. 

69. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula ATA ATT GGG GAT CTT 

15 CCA AA. 

70. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula TAT GCC AAT CAC AGC 
CAC. 

71. The probe of claim 55, wherein said subgenomic region 
20 includes a nucleotide sequence of the formula GTC TGG CTC CCA AGT 

TGA CC. 

72. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula CGG TAT CAG GGT CAA 
CAT. 
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73. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula TGA GGC TGC CCT GCT 
CCA. 

74. The probe of claim 55, wherein said subgenomic region 
5 includes a nucleotide sequence of the formula CCA CCG CTG TCC GGG 

AGG, 

75. The method of claim 54, wherein said subgenomic region 
includes said Norwalk genome's first open reading frame. 

76. The probe produced by the method of claim 75. 

10 77. The method of claim 54, wherein said subgenomic region 

includes nucleotides 146 through 5359. 

78. The probe produced by the method of claim 77. 

79. The method of claim 54, wherein said nucleotides code for a 
picornavirus 2C-like protein, a 3C-like protease, an RNA-dependent RNA 

15 polymerase or any combination thereof. 

80. The probe produced by the method of claim 79. 

81. The method of claim 54, wherein said nucleotide codes for a 
capsid protein. 

82. The probe produced by the method of claim 81. 

20 83. The method of claim 54, wherein said subgenomic region 

includes nucleotides 5337 through 7573. 

84. The probe produced by the method of claim 83. 
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85. The method of claim 54, wherein said subgenomic region 
includes nucleotides 5346 through 6935. 

86. The probe produced by the method of claim 85. 

87. The method of claim 54, wherein said subgenomic region 
5 includes nucleotides 6938 through 7573. 

88. The probe produced by the method of claim 87. 

89. A method of making a probe to detect Norwalk-related 
viruses, comprising the steps of: 

selecting one or more nucleotide sequences from the group 
10 consisting of GTTGCTGTTGGCATTAACA, 

TAGTGGCATGGGTATTTC, ATTATAGTTTCTTGCATA, 
AGCAAAGTCATACATGAAAT, and ACTCACCCAAATCCTCCA; 

producing said nucleotide sequence by chemical methods or 
in an expression system. 

15 90. The probe produced by the method of claim 89. 

91. A kit for detecting an immune response to Norwalk virus, 
comprising: 

a container including a protein encoded by the Norwalk virus 
genome shown in Table 2 or fragments or derivatives thereof. 

20 92. The kit of claim 91, wherein said protein is selected from the 

group consisting of the protein encoded by nucleotides 1 through 7753, the 
protein encoded by nucleotides 146 through 5359, the protein encoded by 
nucleotides 5337 through 7573, the protein encoded by nucleotides 5346 
through 6935, the protein encoded by nucleotides 6938 through 7573 and 

25 any combination thereof. 
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93. A kit for detecting an immune response to a Norwalk-related 
virus, comprising: 

a container including a protein encoded by the genome for 
said Norwalk-related virus. 



5 94. A method of detecting an immune response to Norwalk virus, 

comprising the steps of: 

collecting a serum sample from an individual suspected of 
having been exposed to Norwalk virus; 

selecting a protein encoded by the Norwalk virus genome 
10 shown in Table 2 or fragments or derivatives thereof; 

adding said selected protein to said serum in a diagnostic 
assay under conditions allowing said selected protein and the serum 
to react; and 

measuring the amount of reaction of said serum and said 
15 selected protein. 

95. The method of claim 94, wherein said diagnostic assay is 
selected from the group consisting of enzyme-linked immunosorbent 
assays, radioimmunoassays and immunoblots. 



96. The method of claim 94, wherein said selected protein is a 
20 capsid protein. 

97. The method of claim 94, wherein said selected protein has 
the intrinsic property of being able to form particle(s). 

98. The method of claim 94, wherein said selected protein is 
selected from the group consisting of the protein encoded by nucleotides 

25 1 through 7753, the protein encoded by nucleotides 146 through 5359, the 
protein encoded by nucleotides 5337 through 7573, the protein encoded by 
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nucleotides 5346 through 6935, the protein encoded by nucleotides 6938 
through 7573 and any combination thereof. 

99. A diagnostic assay to detect an immune response to Norwalk 
virus, comprising: 

5 selecting a protein encoded in Norwalk virus genome shown 

in Table 2 or fragments or derivatives thereof; 
using said protein as an antigen; 

adding post-infection serum from a Norwalk infected 
individual under conditions allowing said serum to react with said 
10 antigen; and 

measuring the amount of reaction of said serum and said 
antigen. 

100. The method of claim 99, wherein said protein is a capsid 
protein. 

15 101. The method of claim 99, wherein said protein has the 

intrinsic property of being able to form particle(s). 

102. The method of claim 99, selected from the group consisting 
of the protein encoded by nucleotides 1 through 7753, the protein encoded 
by nucleotides 146 through 5359, the protein encoded by nucleotides 5337 

20 through 7573, the protein encoded by nucleotides 5346 through 6935, the 
protein encoded by nucleotides 6938 through 7573 and any combination 
thereof. 

103. A kit for detecting Norwalk viruses and Norwalk-related 
viruses, comprising: 

25 a container including at least one antiserum made from a 

protein encoded by the Norwalk virus genome shown in Table 2 or 
from a fragment or derivative of said genome. 
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104. The kit of claim 102, wherein said protein is selected from 
the group consisting of the protein encoded by nucleotides 1 through 7753, 
the protein encoded by nucleotides 146 through 5359, the protein encoded 
by nucleotides 5337 through 7573, the protein encoded by nucleotides 
5 5346 through 6935, the protein encoded by nucleotides 6938 through 7573 
and any combination thereof. 



105. A method of producing antibodies to Norwalk and Norwalk- 
related viruses, comprising: 

immunizing animals with a protein encoded by the Norwalk 
10 virus genome shown in Table 2 or fragments or derivatives thereof. 

106. The method of claim 105, wherein said protein is selected 
from the group consisting of the protein encoded by nucleotides 1 through 
7753, the protein encoded by nucleotides 146 through 5359, the protein 
encoded by nucleotides 5337 through 7573, the protein encoded by 

15 nucleotides 5346 through 6935, the protein encoded by nucleotides 6938 
through 7573 and any combination thereof. 



107. A vaccine for Norwalk virus, comprising: 

a Norwalk virus antigen encoded by the cDNA sequence of 
Norwalk virus shown in Table 2 or fragments or derivatives 
20 thereof. 



108. The vaccine of claim 107, wherein said antigen is produced 
using nucleotides 146 through 5359 of the Norwalk virus genome shown 
in Table 2 or a derivative thereof. 

109. The vaccine of claim 107, wherein said antigen is produced 
25 using nucleotides 5337 through 7573 of the Norwalk virus genome shown 

in Table 2 or a derivative thereof. 
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110. The vaccine of claim 107, wherein said antigen is produced 
using nucleotides 5346 through 6935 of the Norwalk virus genome shown 
in Table 2 or a derivative thereof. 

111. The vaccine of claim 107, wherein said antigen is produced 
5 using nucleotides 6938 through 7573 of the Norwalk virus genome shown 

in Table 2 or a derivative thereof. 

112. The vaccine of claim 107, wherein said antigen has the 
intrinsic property of being able to form particle(s). 

113. A method of immunizing an individual against Norwalk 
10 virus, comprising the step of: 

orally or parenterally administering an immunologically 
effective dose(s) of the vaccine of claim 107. 

114. A method of immunizing an individual against Norwalk 
virus, comprising the steps of: 

15 orally and parenterally administering an immunologically 

effective dose of the vaccine of claim 107. 



115. A cDNA sequence of the human calicivirus Sopporo genome 
shown in Figure 9 and fragments and derivatives thereof, said fragments 
and derivatives having sufficient size and nucleotide homology to bind a 

20 Norwalk or Norwalk-related virus genome. 

116. A protein encoded by nucleotides including nucleotides 1 
through 551 of the human calicivirus Sopporo genome shown in Figure 9 
or fragments or derivatives thereof. 

117. A cDNA subclone of the human calicivirus Sopporo genome 
25 comprising nucleotides 1 through 149 and fragments and derivatives 
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thereof, said fragments and derivatives having sufficient size and 
nucleotide homology to bind a Norwalk or Norwalk-related virus genome. 

118. A cDNA subclone of the human calcicivirus Sopporo genome 
comprising nucleotides 113 through 551 and fragments and derivatives 

5 thereof, said fragments and derivatives having sufficient size and 
nucleotide homology to bind a Norwalk or Norwalk-related virus genome. 

119. A cDNA sequence of the Day care calicivirus genome shown 
in Figure 9 and fragments and derivatives thereof, said fragments and 
derivatives having sufficient size and nucleotide homology to bind a 

10 Norwalk or Norwalk-related virus genome. 

120. A cDNA sequence of the SRSV/KY/89 genome shown in 
Figure 12 and fragments and derivatives thereof, said fragments and 
derivatives having sufficient size and nucleotide homology to bind a 
Norwalk or Norwalk-related virus genome. 

15 121. A cDNA sequence of the human calicivirus Houston shown 

in Table 10 and fragments and derivatives thereof, said fragments and 
derivatives having sufficient size and nucleotide homology to bind a 
Norwalk or Norwalk-related virus genome. 

122. A cDNA subclone of a primate calicivirus comprising the 
20 sequence TGGACGGACC TGCTGTTGAA GATCTCTTCA 

AANGGCTCGA ACGACCAAAG CACGATCGGT ATTGTGTTGA 
CTACGCAAAG TGGGACTCAA CCCANCCACCA AAAGTAACAT 
CCAATCAATN GACATC and fragments and derivatives thereof, said 
fragments and derivatives having sufficient size and nucleotide homology 
25 to bind a Norwalk or Norwalk-related virus genome. 

123. A cDNA subclone of a primate calicivirus comprising the 

sequence GTGANATGNN ACATCTTCGA CTCGATGGAC CTATTCACAT 
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ATGGTGATGA CGGTGTCTAC ATCGTCCCAC CACTATATCA 
TCTGTCATGC CCAAGTCTTC ACCAACCTGA AAC and fragments and 
derivatives thereof, said fragments and derivatives having sufficient size 
and nucleotide homology to bind a Norwalk or Norwalk-related virus 
5 genome. 

124. A method of detecting an immune response to Norwalk or a 
Norwalk related virus, comprising the steps of: 

collecting a serum sample from an individual suspected of 
having been exposed to Norwalk or a Norwalk related virus; 
10 selecting a protein encoded by the genomic sequence of a 

Norwalk-related virus or fragments or derivatives thereof, said 
fragments and derivatives having sufficient size and nucleotide 
homology to bind a Norwalk or Norwalk-related virus genome; 

adding said selected protein to said serum in a diagnostic 
15 assay under conditions allowing the selected protein and the serum 

to react; and 

measuring the amount of reaction of said serum and said 
selected protein. 

125. The method of claim 124, wherein said diagnostic assay is 
20 selected from the group consisting of enzyme-linked immunosorbent 

assays, radioimmunoassays and immunoblots. 

126. The method of claim 124, wherein said genomic sequence is 
the cDNA sequence of claim 117. 

127. The method of claim 124, wherein said genomic sequence is 
25 the cDNA sequence of claim 119. 

128. The method of claim 124, wherein said genomic sequence is 
the cDNA sequence of claim 120. 
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129. The method of claim 124, wherein said genomic sequence is 
the cDNA sequence of claim 121. 

130. The method of claim 124, wherein said genomic sequence is 
the cDNA sequence of claim 122. 

5 131. The method of claim 124, wherein said genomic sequence is 

the cDNA sequence of claim 123. 

132. A kit for detecting Norwalk viruses and Norwalk-related 
viruses, comprising: 

a container including at least one antiserum made from a 
10 protein encoded by genomic sequence of a Norwalk-related virus 

genome or from a fragment or derivative said genomic sequence, 
said fragments and derivatives having sufficient size and nucleotide 
homology to bind a Norwalk or Norwalk-related virus genome. 

133. The kit of claim 132, wherein said genomic sequence is the 
15 cDNA sequence of claim 117. 

134. The kit of claim 132, wherein said genomic sequence is the 
cDNA sequence of claim 119. 

135. The kit of claim 132, wherein said genomic sequence is the 
cDNA sequence of claim 120. 

20 136. The kit of claim 132, wherein said genomic sequence is the 

cDNA sequence of claim 121. 

137. The kit of claim 132, wherein said genomic sequence is the 
cDNA sequence of claim 122. 
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138. The kit of claim 132, wherein said genomic sequence is the 
cDNA sequence of claim 123. 

139. A chimeric protein, comprising: 

a protein encoded by a Norwalk virus genome combined with 
5 a protein encoded by a genome of a Norwalk-related virus. 

140. A method of detecting an immune response to Norwalk virus, 
comprising the steps of: 

collecting a serum sample from an individual suspected of 
having been exposed to Norwalk virus; 
10 adding said the chimeric protein of claim 139 to said serum 

in a diagnostic assay under conditions allowing chimeric protein 
and the serum to react; and 

measuring the amount of reaction of said serum and said 
chimeric protein. 

15 141. A vaccine for Norwalk or Norwalk related viruses, 

comprising 

the chimeric protein of claim 139 used as an antigen. 

142. A kit for detecting Norwalk or Norwalk-related related 
viruses, comprising: 
20 a container including at least one antiserum made from the 

chimeric protein of claim 139. 
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21 41 
G TGC TCT GGG AGC GGG CAT ACA GGT TGG TGG CGA CAG GCC CTC CAA 
cys ser gly ser gly his thr gly trp trp arp gin ala leu gin 

61 81 
AGC CAA AGG TAT CAA CAA AAT TTG CAA CTG CAA GAA AAT TCT TTT 
ser gin org tyr gtn gin asn leu gtn leu gin glu asn ser phe 

101 121 
AAA CAT GAC AGG GAA ATG ATT GGG TAT CAG GTT GAA GCT TCA AAT 
lys his asp org glu net l le gly tyr gin val glu ala ser asn 

141 161 18 

CAA TTA TTG GCT AAA AAT TTG GCA ACT AGA TAT TCA CTC CTC CGT 
gin leu leu ala lys asn leu ala thr org tyr ser leu leu org 

1 201 221 

GCT GGG GET TTG ACC ACT GCT GAT GCA GCA AGA TCT GTG GCA GGA* 

ala gly gly leu thr ser ala asp ala ala org ser val ala gly 

241 261 

GCT CCA GTC ACC CGC ATT GTA GAT TGG AAT GGC GTG AGA GTG TCT 
ala pro val thr org lie val asp trp asn gly val org val ser 

281 301 
GCT CCC GAG TCC TCT GCT ACC ACA TTG AGA TCC GGT GGC TTC ATG 
ala pro glu ser ser ala thr thr leu org ser gly gly phe ttet 

321 341 36 

TCA GTT CCC ATA CCA TTT GCC TCT AAG CAA AAA CAG GTT CAA TCA 
ser vat pro lie pro phe ala ser lys gin lys gtn val gin ser 

1 361 401 

TCT GGT ATT ACT AAT CCA AAT TAT TCC CCT TCA TCC ATT TCT CGA 
ser gly I le ser asn pro asn tyr ser pro ser ser I le ser org 

421 441 

ACC ACT ACT TGG GTC GAG TCA CAA AAC TCA TCG AGA TTT GGA AAT 
thr thr ser trp val glu ser gin asn ser ser arg phe gly asn 

461 481 

CTT TCT CCA TAC CAC GCG GAG GCT CTC AAT ACA GTG TGG TTG ACT 
leu ser pro tyr his ala glu ala leu asn thr val -trp leu thr 

501 

CCA CCC GGT TCA ACC 
pro pro gly ser thr 
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Figure 14a 



ATGCACTTCA CAGGTGAATA GCATCAACCA 
ATGTACCTCA CAAGTGAACA GCATCAATCA 
CTGCACATCA CAGTGGAATT CCA-TGCCCA 
CTGCACCTCA CAGTGGAACT CCATTGCCCA 
TTGCACCTCA CAGTGGAACT CCATTGCCCT 
ATGTACTTCC CAGGTGAACA GCATAAATCA 
CTGCACTTCT CAAGTAAAIA GCATAAATCA 
CTGCACTTCT CAAGTGAAIA GCATAAATCA 

4* ** *• ** * + ** * 

ACTCTATGTG CATTGTCAGA AGTCACTGGC 
ACCTTGTGGG GGGTATGAGA AGTmCTGGT 
ACACTCTGTG CACTATCTGA AGTCACAAAC 
ACTCTGTGTG CCCTTTCTGA AGTGACAGGA 
ACTCTGTGTG CCCTTTCTGA AGTGACAGGA 
ACTCTCTGTG CACTGTCTGA GGCCACTGGT 
ACCCTTTGTG CACTGTCTGA GGCTACTGGC 
ACCCTTTGTG CATTGTCTGA GGCTACTGGC 
*+ * ** * * *# ** + ** 

ATGTGATACA ATCACAATCT TATTTCTCAT 
ATGTAATACA GTCACAATCT TACTTTTCAT 
ACATCATACA AGCTAACTCC TTGTTCTCTT 
ACATCATACA AGCTAATTCC ATGTACTCTT 
ACATCATACA AGCTAATTCC ATGTACTCTT 
ATGTGGTGCA ATCCATGTCA TATTTCTCAT 
ATGTGGTGCA GTCCATGTCA TACTTCTCAT 
ATGTGGTACA GTCCATGTCA TACTTCTCAT 
***** * ** * ** * 
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